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STUDIES IN INCIDENTAL LEARNING: VI. 
INTRASERIAL INTERFERENCE ! 


LEO POSTMAN AND PAULINE AUSTIN ADAMS 


Unwwersity of California 


This paper presents a group of 
experiments concerned with intra- 
serial interference under conditions of 
intentional and incidental learning. 
Previous findings (16, 17, 18) sug- 
gested certain systematic differences 
between intentional and incidental 
learners with respect to the develop- 
ment and effects of intraserial inter- 
ference. 

When intentional and incidental 
learners are tested for retention of a 
series of items, the intentional Ss 
typically recall a larger number of 
items than the incidental Ss. At the 
same time, intentional learning may 
produce more associative interference. 
The effort to fixate and connect all the 
items of the series is conducive to both 
stimulus and response generalization. 
Incidental learning is usually limited 
to a few selected items and should 
result in less intralist generalization. 
It follows that increases in the simi- 
larity of the items, which enhance 
intraserial interference, should have 
more adverse effects on serial learning 
under intentional than incidental con- 


1 This research was facilitated by a grant from 
the Behavioral Sciences Division of the Ford 
Foundation. We are grateful to Miss Estelle 
von Ende for her assistance. 


ditions. 
and IB. 

Reminiscence in serial rote learning 
has been related to the amount of 
associative interference developed be- 
fore the rest interval. Following 
massed practice Hovland found con- 
siderably more reminiscence in the 
central section than at the ends of a 
serial list (7). Since the central 
section of the list is spanned by a 
larger number of remote associations 
than are the ends, one possible inter- 
pretation of this finding is that dif- 
ferential forgetting of remote asso- 
ciations produces more beneficial 
effects in the center than at the ends 
of the series. By the same token, if 
intentional practice results in more 
intraserial interference than _ inci- 
dental practice, intentional learners 
should show more reminiscence than 
incidental learners. This prediction 
is tested in Exp. II. 

Remote associations built up during 
practice are an important source of 
intraserial interference. As a direct 
test of the assumptions about intra- 
serial interference, Exp. III presents 
an analysis of remote associations 
after different amounts of intentional 
and incidental practice. 


This is tested in Exp. IA 
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Experiment IA 


Method 


Materials.—Six 14-syllable lists constructed 
by Underwood (21) were used. Three had high 
intralist similarity, and three had low intralist 
similarity. All syllables had Glaze (4) asso- 
ciation values of 46.67-53.33%. Half the Ss 
were trained with lists of high similarity, the 
other half with lists of low similarity. Each S 
learned only one list, but the three lists of each 
type were used equally often. 

Conditions of lrarning.-A procedure de- 
scribed by Brown (1) was used to expose in- 
tentional and incidental Ss to the learning 
materials under identical conditions. The Ss 
were informed that the experiment was con- 
cerned with the investigation of the effects of 
fatigue on speech. A series of 14 nonsense 
syllables was exposed for six trials in the window 
of a Hull memory drum at a 2-sec. rate, with 10 
sec, between trials. As each syllable appeared, 
S spelled it aloud, letter by letter. In spelling 
the syllables, S spoke into a microphone con- 
nected with a wire recorder. Intentional 
learners were given the additional instruction 
to learn as many of the syllables as possible, 
but no reference was made to memorization of 
the serial order. 

A 5-min. test of free recall was given 30 sec. 
after the end of practice. The Ss were in- 
structed to write down as many syllables as they 
could remember, regardless of the order of 
presentation. 

Subjects. —With two degrees of intralist simi- 
larity and two types of instructions, the experi- 
mental design included four groups of 24 Ss each. 
The Ss were assigned to the groups by means of 
a table of random numbers, with correction for 
equal N’s at the end. A table of random 
numbers was also used to determine the order 
in which the three different lists of a given degree 
of similarity were presented, with the restriction 
that each list was used equally often. Six inci- 
dental Ss (three in each similarity condition) 
attempted to memorize the syllables and were 
replaced. All Ss were undergraduate students 
at the University of California. 


Results 


Amount recalled.—Table 1 shows 
the average number of items repro- 
duced correctly on the test of free 
recall. An analysis of variance of the 
recall scores is presented in Table 2. 
Intentional learners recall  signifi- 
cantly larger amounts than incidental 
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TABLE 1 


Mean Numper or Irems Correctiy 
Repropucep 1n Free Recary 





Intentional Incidental 
Intralist 


Similarity 


Mean SD 


Low 6.25 2.57 
1.87 


High 4.88 


learners. Intralist similarity has only 
limited effects on amount of free 
recall. For both instructions com- 
bined, the difference between the two 
lists falls short of statistical signifi- 
cance. When intralist similarity is 
low, there is a relatively large differ- 
ence in favor of intentional learners. 
When intralist similarity is high, the 
recall scores of the two groups con- 
verge. The interaction between in- 
structions and similarity is, however, 
not significant. 

Serial position effects.—Figure 1 
shows frequency of recall as a function 
of serial position. These curves, 
which show the distribution of items 
in free recall, are less regular than 
those usually obtained by the method 
of anticipation. Nevertheless, there 
are pronounced primacy and recency 
effects. 

In agreement with earlier results 
(16), it is found that the extent of 
these effects differs for the two kinds 
of learners. Intentional learners 
show considerable rises both at the 
beginning and end of the list, but 


TABLE 2 


Summary or ANALYSIS OF VARIANCE 
or Recaut Scores 


Instructions 
Similarity 
Ixs 
Error: 


*P < O11, 
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Fic. 1. 


Frequency of items in free recall as a 
function of serial position. 


primacy effects tend to be greater 
than recency effects. Incidental 
learners, on «the other hand, clearly 
show greater recency than primacy 
effects. In spite of a lower over-all 
level of retention, incidental Ss equal 
or surpass intentional Ss in recall for 
the final section of the list. Raffel 
(19) has suggested that primacy 


effects in free recall depend on re- 
hearsal of the beginning of the series 


during practice. The present results 
are not inconsistent with her hy- 
pothesis if it is assumed that instruc- 
tions to learn favor rehearsal of the 
beginning of the list. Deese and 
Kaufman (3) have shown, however, 
that the pronounced terminal rise in 
the serial position curve is limited to 
the free recall of lists in which there 
is no sequential association between 
adjacent words. The differences be- 
tween the serial position curves of the 
two kinds of learners may, therefore, 
obtain only with randomly arranged 
lists. 

Change in intralist similarity has 
only smal! effects on the serial position 
curve of the incidental learners. 
Increase in intralist similarity does 
have pronounced detrimental effects 
on the intentional Ss’ recall for the 
middle of the series. Thus, when 
intralist similarity is low, intentional 
learners surpass incidental learners at 
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all but one of the early and middle 
positions (1-11). When _ intralist 
similarity is high, the intentional 
group remains superior to the inci- 
dental group at the early positions but 
loses most of its advantage in the 
middle of the list, which is most 
sensitive to intraserial interference. 

The differential effects of serial 
position are more sensitive to vari- 
ations in intralist interference than 
is a simple summation of the items 
recalled. The method of serial an- 
ticipation should, therefore, be more 
effective than the method of free 
recall in demonstrating differential 
effects of intralist similarity on in- 
tentional and incidental learning. 
The method of serial anticipation 
was used in Exp. IB. 


Expertment IB 
Method 


Procedure.—The materials and conditions of 
training were exactly the same as in Exp. IA, 
but the method of serial anticipation rather than 
the method of free recall was used to measure 
retention. After the sixth presentation of the 
series, Ss were instructed to start anticipating 
each syllable before it appeared in the window 
of the memory drum. The anticipation pro- 
cedure began 1 min. after the last pretraining 
trial. The first syllable of each list was used as 
a cue item so that the anticipation task consisted 
of 13 rather than 14 syllables. Learning was 
continued to a criterion of one perfect repetition. 

Warm-up control groups.—To control for 
warm-up effects and obtain a baseline for evalu- 
ating the effects of the preliminary training, two 
control groups of 24 Ss each were given six 
practice trials with a list of 14 three-digit 
numbers and then shifted to the anticipation 
task. One of these control groups learned the 
high-similarity lists, the other group learned the 
low-similarity lists. 

Subjects.—The Ss were 144 undergraduate 
students. ‘The same procedure as in Exp. IA 
was followed in assigning Ss to groups of 24 
each.?, A considerable number of Ss in the 


? There was one deviation from this procedure. 
The Warm-up Control Groups were incorporated 
into the design after a number of experimental 
Ss had been run. 
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original groups, particularly of those learning 
the high-similarity lists, failed to reach the 
criterion of one perfect repetition. For purposes 
of measuring recall on the first two trials, all the 
Ss in the original sample were used, regardless of 
their subsequent performance. In the further 
analysis of anticipation learning, all those Ss 
were used who reached a criterion of 7/13 correct 
and were carried for at least five trials thereafter. 
Sixteen Ss failed to reach this criterion and were 
replaced. Of these, 11 were in the high-simi- 
larity condition (four intentional, two incidental, 
and five control Ss), and five in the low-similarity 
condition (two intentional, no incidental, and 
three control Ss). The use of a 7/13 rather than 
a 13/13 criterion served to minimize the non- 
random selection of Ss due to the difference in 
difficulty of the two types of list. 


Results 


Recall.—-Table 3 shows the mean 
number of correct responses on the 
first two anticipation trials. On Trial 
1, intentional learners surpass inci- 
dental learners in the recall of both 
lists. This superiority may be due 
in some measure to a greater readiness 
of the intentional learners to perform 


TABLE 3 


Mean Numoper or Correct Responses on 
Finst Two Anticipation Triats anp Mean 
Number or Triats to Crairerton or 7/13 


Trials to 

Intra Trial 1 Trial 2 7/13 
list Criterion 
Simi 
larity 


| 
| Mean | SD | Mean | SD Mean), SD 


Intentional 


| 
2.17 | 1.24 12.67 | 5.29 
79 % | 20.71 | 10.88 


| 
Low 1.00 | 1.02 
High | 42 | .70 


Incidental 


| 

67 |1.02| .79 | 1.12 | 18.71 | 11.48 
25| $2) .71.| .79|21.17| 9.47 
| 





‘ontrol 


29) .54 15.75 | 7.18 
29 | ae 20.00) 6.43 
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the anticipation task. The two types 
of learners differ with respect to the 
changes from Trial 1 to Trial 2. 
Intentional learners show considerable 
improvement in recall of the low- 
similarity list and a smaller increase 
for the high-similarity list. Inci- 
dental learners show a relatively 
greater improvement for the high- 
similarity list than for the low-simi- 
larity list. Thus, on Trial 2 the 
difference between the two kinds of 
learners is considerably greater for 
the low-similarity list than for the 
high-similarity list. 

The recall scores of the experi- 
mental groups on the first two antici- 
pation trials were subjected to an 
analysis of variance. Since the means 
and variances were highly correlated, 
the Freeman-Tukey, square-root 
transformation (/x+J/x4I1) 
was used (15, p. 326 f.). All the F 
ratios in this analysis are based on 1 
and 92 df. The over-all difference 
between intentional and incidental 
learners is clearly _ significant 
(F = 9.54, P< .0Ol), as are the 
differences between the two types of 
lists (F = 12.19, P < .001), and the 
difference between Trial 1 and Trial 2 
(F = 24.28, P < .001). There is also 
a significant interaction between In- 
structions and Similarity (F = 5.97, 
Ol < P < 02). Summed over both 
trials, the difference between inten- 
tional and incidental learners is sig- 
nificantly greater when intralist simi- 
larity is low than when it is high. 
Finally, the significant second-order 
interaction, Instructions Similarity 
< Trials (F = 6.08, 01 < P < .02) 
reflects the fact that the difference 
between intentional and incidental 
learners depends on both intralist 
similarity and on the point at which 
recall is measured. The differential 
effects of the preliminary training are 
brought out more clearly on Trial 2 
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Successive Fifths 


Fic. 2. 


Mean number of correct responses and overt errors per trial for successive fifths of 


learning period leading to criterion of 7/13. 


than on Trial 1, i.e., after Ss have 
been introduced to the anticipation 
procedure. 

Trial 2 served as the first recall test 
for the control Ss. On that trial the 
control groups recalled less than either 
of the experimental groups. These 
scores of the experimental and control 
groups are not directly comparable, 
however, since the first trial was a 
recall test for the experimental Ss 
and provided initial exposure to the 
series for the control Ss. 

Trials to criterion.—Table 3 pre- 
sents the mean number of trials to a 
criterion of 7/13 for the experimental 
and control groups. As the values 
of the SD’s indicate, the distributions 
were extremely variable. Analysis of 
variance (following logarithmic trans- 


formation to remove heterogeneity of 
variance) shows that the high-simi- 
larity lists were learned significantly 
more slowly than the low-similarity 
lists (F = 14.23, df=1 and 138, 
P < .001). Condition of preliminary 
training was not a significant source 
of variance (F = 1.86, df = 2 and 
138), was the interaction of 
conditions of training and intralist 
similarity (F = .94). Thus, the sheer 
number of trials to criterion is not 
sensitive to differences in preliminary 
training. Significant effects emerge, 
however, when the course of learning 
to criterion is considered. 

Correct responses.—In Fig. 2 correct 
responses per trial and overt errors 
per trial are plotted for successive 
fifths of the period leading to the 


nor 
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criterion of 7/13 correct. Since a 
criterion of partial mastery was used, 
the curves reflect the effects of pre- 
training only on the initial phases of 
anticipation learning. To evaluate 
the significance of the differences 
among the curves, the trend analysis 
described by Grant (5) was used. In 
order to remove heterogeneity of 
variance, the raw scores were con- 
verted into percentages of the total 
number of items in the list (13), and 
the arc-sine transformation was ap- 
plied to the distributions of per- 
centages. Summaries of the trend 
analyses are presented in Table 4. 
Since our analysis is concerned with 
the over-all rates of improvement, 
tests are presented only for the linear 
components of the trends. 

The level of correct responses varies 
significantly as a function of both 
intralist similarity and pretraining 
instructions. The interaction of these 
variables is significant as well. The 
number of correct responses is (a) 
larger for the low-similarity lists than 
for the high-similarity lists, and (b) 
larger for the intentional groups than 
for the incidental and control groups. 
The two latter groups are close to- 
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gether. The differences among the 
groups are greater when intralist 
similarity is low than when it is high. 

There is no difference in slope as a 
function of intralist similarity. The 
differences in slope produced by pre- 
training instructions are significant. 
This effect seems to be due primarily 
to the results obtained with the low- 
similarity lists, where the curve of the 
intentional group has a considerably 
gentler slope than that of the other 
two groups. The slopes of the three 
groups are more alike when intralist 
similarity is high. The variations in 
slope associated with the interaction 
of pretraining instructions and intra- 
list similarity fall short of significance, 
however. 

Overt errors.—Overt errors include 
(a) intralist intrusions, (b) partial 
responses, and (c) responses from 
outside the list. Intralist similarity 
per se does not influence the total 
number of overt errors. Using the 
same lists, Underwood (21) also 
failed to find significant differences in 
number of errors as a function of 
intralist similarity. The frequency 
of errors varies significantly as a 
function of pretraining instructions. 


TABLE 4 


Summary or Trenp Anatyses or Correct Responses anp Overt Errors 





Correct Responses 


MS 





Between group means 
Similarity 
Instructions 
Ix5S 
Between individual means 
Between group linear trends 
Similarity 
Instructions 
Ixs 
Between individual linear trends 


440.75 
480.09 
293.84 

80.22 


5.49** 
5.9R*** 
3.66" 





3.16 
434.90 
74.36 
32.54 

















STUDIES IN INCIDENTAL LEARNING 


The interaction of pretraining and 
similarity is also significant. For 
both types of lists, the experimental 
groups give a larger number of overt 
errors per trial than the control group, 
with the curve of the intentional 
learners lying above that of the 
incidental _ learners. Preliminary 
training familiarizes Ss with the 
items in the list and increases their 
readiness to make overt errors. The 
over-all differences between the ex- 
perimental and control groups are 
greater for the low-similarity than 
for the high-similarity lists. 

With the criterion defined by mas- 
tery of about half the list, there is 
only little reduction in overt errors 
during the learning period leading to 
the criterion. Both experimental 
groups show slight declines in the 
number of errors between the initial 
and terminal period, whereas the 
curves of the control groups show a 
gradual rise. This rise probably re- 
flects the familiarization of the control 
Ss with the items in the list. The 
differences in linear slope as a function 
of pretraining are highly significant. 
The rate of change in errors does not 
vary significantly as a function of 
intralist similarity. 

Analysis of transfer effects.—The 
trend analyses suggest that pre- 
training may influence subsequent 
anticipation learning in three ways: 
(a) by familiarizing Ss with the 
individual items, (b) by establishing 
some correct serial connections, and 
(c) by building up intralist inter- 
ference. The net effect of these 
influences depends on both pretraining 
instructions and intralist similarity. 

For both types of lists, the experi- 
mental groups make more overt 
errors than the control group. When 
intralist similarity is low and the 
differentiation of the items is rela- 
tively easy, the intentional group also 
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makes more correct responses per 
trial during the initial stages of 
anticipation learning. Incidental Ss 
do not show comparable transfer 
effects. Thus, intentional pretraining 
is more beneficial than incidental pre- 
training. When intralist similarity 
is high and the differentiation of the 
items is difficult, there are no con- 
sistent differences among the groups. 
The interferences built up during 


pretraining fully counteract the posi- 
tive transfer effects of familiarization 
for both experimental groups. 


Experiment II 
Method 


Procedure.—The conditions of pretraining and 
anticipation learning were the same as for the 
intentional and incidental groups in Exp. IB, 
with two exceptions: (a) only lists of low intra- 
list similarity were used, and (b) a 2-min. rest 
period was introduced after attainment of a 
criterion of 7 out of 13 syllables correct. The 
rest period was filled with color naming. ‘The 
colors were presented in the window of the 
memory drum at a 2-sec. rate. After the rest, 
learning was continued to a criterion of one 
perfect repetition. 

Subjects.—There were 18 Ss in each group, 
assigned to the experimental conditions at 
random (cf. Exp. IA and IB). None of the Ss 
failed to reach the criterion of 7/13, and all were 
carried at least five trials beyond the criterion. 
Thus, all Ss were included in the analysis, re- 
gardless of whether or not they reached the 
criterion of one perfect repetition. Six inci- 
dental Ss, however, reported memorizing the 
syllables during pretraining and were replaced. 

Since the experiment was performed under 
identical conditions as Exp. IB, the intentional 
and incidental groups learning the low-similarity 
lists in Exp. IB were used as control groups in 
the analysis of the present study. The groups 
given 2 min. of color naming will be designated 
as Rest Groups, and the control groups as No- 
Rest Groups. 


Results 


Pre-rest learning.The mean num- 
ber of trials required to reach a 
criterion of 7 out of 13 syllables 
correct is shown in Table 5. Analysis 
of variance show no significant 
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TABLE 5 


Trias To Crrrenion or 7/13 ann Dirrerences 
Berween Fiast PostcriTerRion anv 
Crrrenion Triacs 


| First 
Postcriterion 
Criterion 
Kecal! Difference 


Trials to 
7/13 Criterion 


Mean SD Mean SD 





Intentional 





Rest 
No-Rest 


13.00 
12.67 





Incidental 





Rest 
No-Rest 


19.00 


6.59 | —1.50| 1.42 
18.71 


11.48 | — 88} 1.98 








difference between the Rest and 
No-Rest Groups (F = .03). The in- 
tentional groups reach the criterion 
of 7/13 in fewer trials than the 
incidental groups, and the difference 
is significant (F = 10.73, df = 1 and 
80, P < 01). Note that this differ- 
ence, which was not reliable in Exp. 
IB, reaches significance when the 
comparison is based on larger groups 
of Ss. The interaction between con- 
ditions of pretraining and conditions 
of rest is, of course, not significant 
(F = .001). 

The effect of rest on recall.—-The 
mean difference between the number 
of items recalled on the first post- 
criterion trial and the criterion trial 
is presented in Table 5. The In- 
tentional Rest Group shows a slight 
increase in the amount recalled. All 
other groups show a loss. The decre- 
ments on the first postcriterion trial 
are in agreement with the findings of 
Hovland and Kurtz (8). The use of 
a fixed criterion probably capitalizes 
on spurts of performance which are 
not maintained due to oscillation at 
recall (8, 13). 

The effect of rest on recall varies 
as a function of preliminary training. 
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When the changes between the cri- 
terion trial and the first postcriterion 
trial are compared, the difference 
between the Intentional Rest Group 
and the Intentional No-Rest Group 
is found to be significant (t = 2.19, 
df=40, 02 <P < 05). The dif- 
ference between the two incidental 
groups, on the other hand, is not 
significant (t = 1.11). Relative to 
the performance of the control groups, 
the introduction of a rest has bene- 
ficial effects after intentional pre- 
training but not after incidental pre- 
training. Analysis of variance shows 
the interaction between conditions of 
training and conditions of rest to be 
significant (F = 5.65, df = 1 and 80, 
02 < P < 05). The introduction of 
a rest interval has significantly dif- 
ferent effects after intentional and 
incidental pretraining. 

Persistence of the effects of rest.—To 
determine the stability of the effects 
of rest, the comparison of the experi- 
mental and control groups was ex- 
tended over the first four trials after 
criterion. Figure 3 shows the mean 
number of correct responses on these 
trials. The Intentional Rest Group 
continues to give a larger number of 
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correct responses than the Intentional 
No-Rest Group; in fact, the difference 
on the fourth postcriterion trial is 
somewhat greater than on the first. 
Thus, the difference between the two 
intentional groups remains significant 
when the mean numbers of correct 
responses per trial are compared 


02). The 


(¢ = 2.33, df = 40, P= 
difference between the two incidental 
groups does not persist beyond the 
first postcriterion trial, and they do 
not differ with respect to the mean 
number of correct responses per trial 


(t = .12). Analysis of variance of 
the correct responses per trial shows 
a significant interaction of conditions 
of pretraining and conditions of rest 
(F = 11.69, df = 1 and 80, P < Ol). 

Serial position effects and overt 
errors.—lIt has been suggested that the 
differential effects of rest may be due 
to differences in the amounts of intra- 
serial interference produced by in- 
tentional and incidental training. 
Presumably, intentional learners de- 
velop more intraserial interference 
than do incidental learners and thus 
derive greater benefit from the dif- 
ferential forgetting of incorrect asso- 
ciations during the rest interval. If 
these assumptions are correct, it 
follows that (a) the largest difference 


between the Intentional Rest and 
No-Rest Groups should occur in the 
middle of the list, and (b) the Inten- 
tional Rest Group should make fewer 
overt errors than the Intentional 
No-Rest Group. Such differences 
should be present to a lesser degree 
for the incidental groups. 

Analysis of the responses on the 
postcriterion trials lends some support 
to these expectations. Figure 4 shows 
serial position curves for the mean 
number of correct responses on the 
first four postcriterion trials. The 
curve of the Rest Group lies above 
that of the No-Rest Group at all but 
one of the middle positions in the 
list; the two curves coincide at the 
beginning and end of the list. There 
are only small and irregular differences 
between the curves of the two inci- 
dental groups, although there is a 
slight tendency for the Rest Group to 
lie below the No-Rest Group in the 
middle positions. When such curves 
are plotted for the first postcriterion 
trial only, the picture is substantially 
similar except that the curves are 
somewhat more irregular. 

Table 6 presents the mean number 
of overt errors per trial on the first 
four postcriterion trials. After in- 
tentional training, the Rest Group 
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TABLE 6 


Mean Numper or Overt Errors on Finst 
Postcrirernion Tria, ann Mean Number 
or Overt Errors rer Tria on Fiast 

Four Posrcrirerion Triars 





First 
Postcriterion 
Trial 


Four 
Postcriterion 
Trials 


Cond, 





Intentional 





1.94 
2.29 


Rest 
No-Rest 


1,52 
1.73 





Incidental 





2.50 
1.88 


Rest 
No-Rest 


1.42 
1,20 

















makes somewhat fewer errors than 
the No-Rest Group, but the difference 
is clearly not significant (t = .46). 
After incidental training, on the other 
hand, the Rest Group makes more 
overt errors than the No-Rest Group, 
but this difference is again not sig- 
nificant (t = 1.00). To assess the 
significance of the differential effects 
of rest on the two kinds of learners, 
an analysis of covariance of the error 
scores was performed. This analysis 
takes account of the fact that inten- 
tional learners made a larger number 
of overt errors than incidental learners 
on the criterion trial as well as on the 
postcriterion trials. The interaction 
of conditions of pretraining and con- 
ditions of rest is significant (F = 5.22, 
df = 1 and 79, OL < P < 02). 
Thus, the analysis of overt errors 
yields statistically significant results 
only when both kinds of learners are 
compared simultaneously, i.e., when 
two opposite effects of rest are com- 
bined in the same analysis. As 
Table 6 shows, the pattern of error 
scores is similar when only the first 
postcriterion trial is considered, except 
that the differences between experi- 
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mental and control groups are some- 
what larger. Statistical tests of the 
error scores on the first postcriterion 
trial yield the same conclusions as 
the tests of the mean scores on four 
trials. 

The serial position curves seem to 
favor the hypothesis that the differ- 
ential effects of rest reflect variations 
in intraserial interference. The evi- 
dence provided by the analysis of 
overt errors is at best circumstantial. 
Thus, the results are not inconsistent 
with the hypothesis but do not con- 
firm it conclusively. 


ExpeRIMENT III 
Method 


Conditions of learning.—The experiment was 
introduced as a study of individual differences 
in the pronunciation of English words. A series 
of 16 adjectives was presented for either 8 or 16 
trials in the window of a Hull memory drum ata 
2-sec. rate, with 4 sec, between trials. As each 
word appeared, S read it aloud, speaking into 
the microphone of a wire recorder. Intentional 
learners were given the additional instruction to 
learn as many of the adjectives as possible, but 
no reference was made to memorization of the 
serial order. 

Materials.—The list of adjectives was one of 
those used by Wilson (23) in his study of remote 
associations. The series consisted of non- 
synonymous, two-syllable adjectives all of which 
had different initial letters. All words had a 
scale value of 4.25 or above on Haagen’s (6) 
5-point scale of familiarity. There were three 
different random orders of the list which were 
used equally often. 

Association tests.—One minute after the last 
presentation of the series a test for remote 
associations was given. ‘The association method 
(12, p. 104 f.) was used. The items were pre- 
sented at a 48sec. rate in a random sequence 
different from that used in training. Three 
different test orders were used equally often. 
The Ss were instructed to respond to each word 
with the first word from the list that came to 
mind. 

Subjects. —With two kinds of instructions and 
two frequencies of training trials, there were four 
groups of 18 Ss each. Pairs of Ss—one in- 
tentional and one incidental—were assigned to 
the two frequency conditions by means of a 
table of random numbers, with correction for 
equal N’s at the end. In one-half of the cases 
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the incidental S in a pair was run before the 
intentional S, in the other half of the cases the 
order was reversed. Six incidental Ss—two in 
the 8-presentation group and four in the 16- 
presentation group—attempted to memorize the 
syllables and were replaced. 


Results 


Scoring of remote associations.—In 
scoring remote associations, we fol- 
lowed the procedure described by 
Wilson (22). Responses on the test 
were classified as (a) adjacent forward 
associations, (b) remote forward asso- 
ciations, (c) remote backward as- 
sociations and (d) failures to respond. 
There was only one instance of a 
response from outside the list, which 
was classified as a failure. Adjacent 
backward associations were classified 
as remote associations. ‘The distance 
between the stimulus word and the 
response word in the learning series 
was used to determine the degree of 
remoteness of an association. Fol- 
lowing Wilson, we classified an asso- 
ciation from Position 16 to Position 1 
as an adjacent forward association, 
and an association from Position 1 to 
Position 16 as an adjacent backward 
association. The procedure prevents 
an “artificial inflation” of remote 
associations of 14-degree remoteness. 
The number of such cases was, how- 
ever, very small. 

Frequency of remote associations. 
Table 7 shows the mean number of 
adjacent forward associations, remote 
associations, and failures to respond. 
Remote forward and backward asso- 
ciations occurred with almost equal 
frequency and are combined in the 
table. Since failures to respond were 
infrequent and did not vary signifi- 
cantly from condition to condition, 
there is a reciprocal relationship be- 
tween the number of adjacent and 
remote associations. The discussion 
of the experimental results will, there- 
fore, be confined to remote asso- 
ciations. 
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TABLE 7 


Mean Numpers or Dirrerent Tyres 
or AssociaATION 





Adjacent 


Forward Failures 


Remote 


Mean | SD Mean | SD | Mean | SD 





Intentional 





2.00 | 2,08 
1.44 | 2.04 








8 33 | 1.46 | 12. 39 | 2.48 
16 1, ii. 5. j J 1.66 














Incidental learners give a larger 
number of remote associations than 
intentional learners, and the difference 
between the two groups increases with 
exercise. For intentional Ss, the fre- 
quency of remote associations de- 
creases with exercise, in agreement 
with the results of association tests 
given after different degrees of antici- 
pation training (22). For incidental 
Ss, on the other hand, the number of 
remote associations shows some in- 
crease as a function of practice. 
Analysis of variance shows the dif- 
ference between the two kinds of 
learners to be significant (F = 4.79, 
df = | and 68, 02 < P < .05). For 
both groups combined, the difference 
between the two degrees of practice is 
not significant (F = .22). The inter- 
action of instructions and exercise 
approaches, but does not reach sig- 
nificance (F = 3.65, df = 1 and 68, 
OS <P < 10). 

Degree of remoteness.—Figure 5 
shows the frequency of remote asso- 
ciations as a function of the degree 
of remoteness. Forward and back- 
ward associations of equal degrees of 
remoteness have been combined. 
Adjacent backward associations are 
represented as associations of 0-degree 
remoteness. ‘The number of possible 
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function of degree of remoteness. 


associations decreases with the degree 
of remoteness, and it has been cus- 
tomary to weight the observed fre- 
quencies in order to correct for the 
differential in opportunity (2, 20, 23). 
Since our primary interest is in the 
differences among the groups, we 
have not applied any such weights. 
The decrease in the unweighted 
number of associations as a function 
of degree of remoteness is in agree- 
ment with the results of tests given 
after anticipation learning (2, 11, 23). 

Intentional learners give a higher 
proportion of associations of a low 
degree of remoteness than do inci- 
dental learners. Increase in the fre- 
quency of exercise reduces the rela- 
tive frequency of associations of a 
high degree of remoteness for both 
groups. Such an inverse relationship 
between amount of training and 
degree of remoteness of associations 
has been reported in some studies of 
anticipation learning (9, 10), but not 
in others (14, 23). To evaluate the 
significance of these trends, the mean 
degree of remoteness was determined 
for the remote associations of each S 
and the distributions of these “re- 
moteness scores’? were subjected to 
an analysis of variance. After 8 
presentations, the mean remoteness 
score was 4.39 for the intentional 


group, and 5.03 for the incidental 
group. After 16 presentations the 
means dropped to 3.41 for the inten- 
tional group and 4.33 for the inci- 
dental group. The difference be- 
tween the two kinds of learners is 
significant (F = 6.76, df = 1 and 68, 
Ol < P < .02), as is the difference 
between the frequencies of exercise 
(F = 7.87, df = 1 and 68, P < 01). 
The interaction of instructions and 
exercise is not significant (F < 1). 

If we assume that the strength of 
remote associations is inversely re- 
lated to degree of remoteness (cf. 2; 
12, p. 109), it follows that the average 
strength of remote associations is 
greater for intentional than incidental 
learners. Over the limited range of 
trials studied here, increase in exercise 
raises the average strength of remote 
associations for both intentional and 
incidental learners. 

Serial position effects.—Association 
tests given after anticipation training 
show the highest proportion of asso- 
ciations in response to stimuli from 
the middle of the training list (2, 23). 
The curve is flattened with increased 
learning (23). We would expect such 
serial position effects to be less pro- 
nounced when training is without 
reference to serial order. Figure 6 
shows the mean percentage of remote 
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associations (i.¢., remote associations 
expressed as percentages of the total 
number of associations) for successive 
blocks of four stimulus words each. 
Serial positions were combined in 
blocks in order to stabilize the trends 
The variations as a function of serial 
position are not extensive; remote 
associations account for a high per- 
centage of responses in all blocks. In 
general, the curves show a rise in the 
center, although the curve for in- 
tentional learners at the lower level 
of exercise is relatively flat. To test 
the significance of these trends, the 
difference between the number of 
adjacent and remote associations in 
each block was determined for each 
S, and the distributions of difference 
scores were subjected to an analysis 
of variance. Differences among 
blocks were found to be significant 
(F = 3.18, df = 3 and 204, 02 < P 
< .05). The interactions of the ex- 
perimental treatments with blocks 


fell short of significance, however. 


Although the differences among 
blocks do not vary significantly with 
frequency and exercise, two additional 
features of the curves are worth 
noticing. First, the curves of the 
two incidental groups are quite similar 
in shape, i.e., the distribution of asso- 
ciations does not change as a function 
of exercise. Second, increase in train- 
ing does not produce a flattening of 
the curves, as it does in anticipation 
learning; in fact, the curve of the 
intentional group becomes more 
peaked with exercise. In the present 
experiment, training was carried on 
without instructions to learn the 
serial order, and adjacent forward 
associations developed slowly. Thus, 
continued training maintained or in- 
creased the number of remote asso- 
ciations to stimuli in the middle of the 
list intraserial 
interference is most difficult. 


where reduction of 


Discussion 


The experiments agree in showing 
systematic differences between the pat- 
terns of associations acquired under 
intentional and incidental conditions. 
Intentional Ss not only learn more items 
but also are more subject to intraserial 
interference than incidental Ss. The 
net difference in performance between 
the two kinds of learners depends on 
(a) the degree to which the learning task 
is conducive to intraserial interference, 
and (4) the sensitivity of the retention 
test to such interference. 

Experiments IA and IB support the 
hypothesis that increase in intralist 
similarity is more damaging to inten- 
tional than to incidental learning. The 
extent of this differential effect varies 
with the measure of performance. Dif- 
ferences in the sheer amount of free 
recall are in the expected direction but 
fall short of significance. The distri- 
butions of items in free recall provide 
additional and more direct evidence for 
the hypothesis. With increase in simi- 
larity intentional Ss lose their advantage 
over incidental Ss in the middle of the 
list, which is most vulnerable to intra- 
serial interference. The results of the 
anticipation procedure bring the dif- 
ferential effects of intralist similarity 
sharply into focus. When intralist 
similarity is low, intentional Ss surpass 
incidental Ss on the initial test of recall 
and in subsequent anticipation learning. 
When intralist similarity is high, these 
differences between the two kinds of 
learners are substantially reduced. 

Experiment II shows that the intro- 
duction of a rest interval has a beneficial 
effect after intentional but not after 
incidental pretraining. Analysis of the 
postcriterion trials gives some support 
to the hypothesis that the beneficial 
effect of rest results from the dissipation 
of intraserial interferences built up dur- 
ing intentional pretraining. Incidental 
Ss are presumably less subject to such 
interference and hence derive less ad- 
vantage from the rest interval. An 
explanation of these results in terms of 


work-decrement cannot be ruled out. 
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Hovland and Kurtz (8) have shown that 
the beneficial effect of rest increases as a 
function of the amount of “mental work”’ 
prior to learning. If intentional learning 
involves a greater amount of “work” 
than incidental learning, the difference 
obtained in Exp. II could be subsumed 
under their interpretation. 

In Exp. III an attempt was made to 
obtain direct evidence for the differences 
between the associative patterns built 
up during intentional and incidental 
training. Condition of learning and 
frequency of exercise both produce sig- 
nificant differences in the distributions 
of associations. Even though the in- 
structions made no reference to serial 
order, intentional Ss give, both abso- 
lutely and relatively, more adjacent 
forward associations and fewer remote 
associations than the incidental Ss. 
This difference increases as a function of 
practice. The remote associations of the 
intentional learners tend, however, to be 
of a lower degree of remoteness, and 
presumably stronger, than those of the 
incidental learners. These results agree 
with the hypothesis that instructions to 


learn serve to increase the strength of 
both correct and incorrect associations. 


SUMMARY 


A group of experiments is presented com- 
paring intraserial interference in intentional and 
incidental learning. Experiments IA and 1B 
show that increase in intralist similarity, which 
enhances associative interference, is more 
damaging to intentional than incidental learners. 
The method of serial anticipation brings out the 
differential effects of intralist similarity more 
clearly than does the method of free recall. 
Experiment II shows that introduction of a rest 
interval has beneficial effects after intentional 
but not after incidental pretraining. The 
beneficial effects of rest may be due to the 
dissipation of intraserial interferences. Experi- 
ment III presents an analysis of remote asso- 
ciations after different degrees of intentional and 
incidental practice. Intentional Ss give more 
adjacent forward associations and fewer remote 
associations than do incidental Ss. This differ- 
ence increases with practice. The remote 
associations of intentional Ss are of a lower 
degree of remoteness, and hence are likely to 
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be stronger, than those of the incidental Ss. 
Thus, the experiments support the conclusion 
that instructions to learn enhance the strength 
of both correct and incorrect associations. 
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AN ANALYSIS OF MAIER’S PENDULUM PROBLEM! 


PER SAUGSTAD 


University of Oslo 


In a previous report (3) the writer 
reached certain conclusions about the 
problem-solving process that in some 
respects seemed to conflict with con- 
clusions drawn by Maier (2). To 
clarify the issue the experiment on 
which Maier based his conclusions 
was repeated under two sets of con- 
ditions (Exp. I). Since the results 
of this repetition agreed neither with 
Maier’s results nor with the writer’s 
expectations, the experiment was re- 
peated under conditions allowing for 
more rigorous experimental control 
(Exp. Il). Experiment II yielded 
information on various factors oper- 
ating in the experimental situation, 
but as the results still could not be 
considered decisive, additional 
changes in the experimental situation 


were introduced (Exp. III). 


Maier (2) presented his Ss with the following 
problem. In a hallway two crosses are marked 
on the floor, a specific distance apart. With the 
help of four wooden poles, two pieces of wire, 
one table clamp, two burette clamps, eight 
pieces of lead tubing, and several pieces of chalk, 
S was given the task of constructing two pendu- 
lums swinging immediately above the two 
crosses. ‘The two pendulums were to be con- 
structed in such a way that each had a piece of 
chalk attached to it which would mark the floor 
at the points indicated by the two crossmarks. 

To achieve the solution scored as correct by 
Maier, a T-shaped construction had to be 
erected between floor and ceiling. This was 
done by placing the longest pole (the length of 
which exactly spanned the distance between the 
two crossmarks) flat against the ceiling, and 
holding it in position with two other poles. 


'The writer is indebted to The Norwegian 
Research Council for Science and the Humanities 
for a grant to carry out these investigations and 
to Arnold Havelin, candidate in psychology, for 
help in collecting the data in Exp. I and to 
Bjarne Kvilhaug, candidate in psychology, for 
help in collecting the data for Exp. II and III. 
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These two poles were turned into a longer pole 
with the help of a table clamp. The length of 
the longer pole was adjusted so that it just 
reached from the floor to the middle of the pole 
placed flat against the ceiling. The two wires 
were then suspended from the ends of the 
horizontally placed pole. To make a pendulum 
out of the two wires, pieces of lead tubing and a 
burette clamp were fastened to each of the free 
ends of the wire. Finally a piece of chalk was 
secured to each of the two pendulums by a 
burette clamp. 

Prior to the problem situation some of 
Maier’s Ss were given three demonstrations, 
each of which involved a principle: Parts A, 
B, and C. In part A, S was shown how to 
make a plumb line by tying a burette clamp to a 
string and then attaching a pencil to the burette 
clamp. In Part B, S was shown how to turn 
two short poles into a longer one by means of a 
table clamp. In Part C, £ demonstrated how 
a screen could be set up in a doorway by making 
a horizontal T shape. In this demonstration E 
first showed how a screen could be held taut 
vertically along one side by pressing a pole flat 
against the wall of a doorway. Then he showed 
how this pole could be held in position flat 
against the side of the doorway by wedging 
another pole against its center and the opposite 
wall of the doorway. 

Before being presented with the problem 
situation some Ss were also given what Maier 
called Direction. ‘This was introduced to S by 
indicating how simple the problem would have 
been if the two pendulums could be suspended 
directly from nails in the ceiling. 

Maier used five groups for his experiment. 
Group I was presented with the problem situ- 
ation without any additional aids. Group II 
was in advance given the three demonstrations 
described above. Group III was treated as 
Group II with the exception that Ss were told 
that, in trying to reach a solution, they were to 
use the principles involved in the three demon- 
strations. Group IV was given Direction prior 
to the problem situation. Group V was treated 
as Group III, and also given Direction. 

In Groups I, II, and IV Maier obtained no 
solution, in Group III one, and in Group IV 
eight solutions. The difference between Groups 
V and III is significant at the 1% level when 


tested by Fisher’s exact method. Maier argued 
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that the Ss of Group III ought to have the neces- 
sary experiences required to solve the problem, 
since they had received demonstrations of Parts 
A, B, and C. When the majority of Ss in this 
group failed to solve the problem, Maier con- 
cluded that the presence of the necessary “parts” 
or “experiences” in the Ss was not sufficient to 
bring about the correct solution. In addition 
to the necessary “experiences,” the solution to 
the problem involved what he called “direc- 
tion.” Maier’s conclusion thus implies that in 
some way most Ss will encounter an obstacle 
when the “parts” have to’ be combined. 


A logical analysis of Maier’s ex- 
perimental report reveals that his 
conclusion rests on a number of 
untested assumptions, chief among 
which are: (a) the pendulum problem 
does not involve any essential experi- 
ences in addition to those involved in 
Parts A, B, and C; (b) the demon- 
stration of the Parts was conceived 
by Ss as intended by Maier and as 
necessitated by his conclusion; (c) the 
statement of the problem was clear 
to Ss. 

Consideration of three 


these as- 


sumptions led to the three experi- 


ments which are here dealt with. 
Assumption a served as the point of 
departure for Exp. I and Assumption 
b as the point of departure for Exp. 
II and III. Assumption c, possible 
ambiguities in the statement of the 
problem, will be discussed under 


Method, Exp. I. 


EXPERIMENT I 


Maier’s reasoning implies that the 
demonstration of Parts A, B, and C 
contain the same “principles” or “‘ex- 
periences” as those involved in the 
pendulum problem. His term “‘experi- 
ence,” “principle,” or “part,” is highly 
abstract and not easily subjected to 
experimental control. In one important 
respect there appears to be a difference 
with regard to experiences involved in 
the pendulum problem and in the 
demonstrations. In the demonstration 
of Part C a horizontal T shape is con- 
structed against the side of a doorway. 
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To achieve the solution for the pendulum 
problem it is necessary to construct a 
vertical T shape against the ceiling. In 
the pendulum problem the ceiling is thus 
used as a support for a pole. It would 
be reasonable to expect that the use of 
the ceiling as a support will be unavail- 
able to many Ss. The superiority of the 
group that was shown Parts and Di- 
rection over the other groups in Maier’s 
experiment, may thus be due to the fact 
that the attention of Ss in this group is 
drawn to the ceiling. 

On the assumption that this is so, one 
would expect more solutions to the 
problem if the use of the ceiling was in 
some way made more available to Ss, 
Attention to the possible use of the 
ceiling should not be made by pointing 
directly to it in some way, since this 
might be interpreted as another way of 
introducing Maier’s Direction. Lower- 
ing the ceiling, so that § might bump 
his head on it, might lead S to shy away 
from it. A better approach would 
probably be to build a miniature or scale 
model of the hallway used by Maier, and 
place this on a table, so that § might 
have a clearer view of the experimental 
situation. 

In repeating an experiment it is of 
course only necessary to use identical 
conditions to the extent necessitated by 
the conclusion drawn from the results. 
For this reason it is clear that a repetition 
in the miniature hallway must be con- 
sidered a legitimate way of repeating 
Maier’s experiment. The problem is 
still one of combining Parts A, B, and C. 

It cannot be assumed that the Nor- 
wegian students used for the repetition 
in the miniature are comparable in all 
relevant respects to the American 
students used by Maier. A_ control 
group exposed to a situation as close as 
possible to that set up by Maier, there 
fore, had to be introduced. A 
parison of the performance of Ss in the 
scale model and the full-size room might 
furthermore provide information of value 
in assessing the behavior of Ss in Maier’s 
original situation. 

The experiment was thus repeated 
under two sets of experimental condi- 


com 
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tions: (a) under conditions as close as 
possible to those of Maier (Situation A), 
(4) in a miniature hallway (Situation B). 


Method 


Situation A.—A hallway was found to have 
approximately the same dimensions as the one 
used by Maier. By placing a heavy curtain 
across the hallway its length was reduced to 
about 8 m. Breadth and height were, respec- 
tively, 2.75 and 2.20 m. As in Maier’s experi- 
ment a table (1.50 X .79 m.) was placed across 
the hallway from one wall (to the right of £). 
The distance from £’s side of the table to the 
curtain measured 5.40 m. The hall contained 
no windows, but a doorway with a closed door 
(to the left of Z) at a distance of 1.28 m. from 
E's side of the table. The doorway was used 
for demonstration of Part C. 

The poles were adjusted to the height of the 
hall in such a way that their length was in the 
proportion of 2.20:2.52 to the length of Maier’s 
poles (Maier’s hall being 2.52 m. high). The 
poles were 2.09, 1.22, 1.04, and .87 m. long and 
their cross section was the same as described by 
Maier. ‘The distances from the table to the 
first crossmark and between the crossmarks 
were adjusted to the length of the poles according 
to Maier’s diagram (apparently drawn in the 
dimension: 1:100). Lead tubing was used for 
weights, and the rest of the material was pre- 
pared according to Maier’s specifications, with 
the exception that the burette clamps were 
exchanged for another type of clamp of the same 
81Ze, 

The ceiling of the hall was smooth and even. 
It did, however, contain an electric wire running 
along one side. _lf an S attempted to utilize 
this, or the curtain rod, which happened a few 
times, he was told this should not be done. The 
room was illuminated by a lamp placed in the 
ceiling behind the table. 

Situation B.—A scale model, one fourth of 
the dimensions of the hallway (55 cm. high, 69 
em. wide, and 100 cm. long), was placed on a 
table. The first crossmark was placed close to 
Opening 1. To prevent S from utilizing the 
outside of the box for some construction a 
wooden frame was placed around Opening | 
perpendicular to the sides. Close to Opening 2 
the miniature table was placed. It was firmly 
nailed to the floor to prevent tipping. The 
miniature room contained no chair. 

The poles, as well as the distances to the two 
crossmarks, were a quarter the length of those 
used in Situation A. The cross section of the 
poles was 1 cm. X .5 cm. ‘The cross section of 
the wires was the same as in Situation A. Small 
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pieces of lead tubing were used for weights. 
Pieces of pencil lead were substituted for the 
pieces of chalk. The clamps were of the same 
type as those used in Situation A, but smaller. 

The Ss were first placed at Opening | for the 
instructions, and then Parts A, B, and C were 
demonstrated while they were at Opening 2. 
Parts A and B were demonstrated in the wooden 
box, Part C in a doorway at the same side of the 
room as Opening 2. When the instructions 
relating to the chair were read, a chair outside 
the box was pointed to. 

Subjects.—The Ss were 88 medical students, 
aged 19-22. Since only students leaving high 
school with a top grade are admitted to the 
medical department, and since they have all had 
the same curriculum in high school, Norwegian 
medical students may be regarded as a highly 
homogeneous group with regard to mental 
abilities. 

Great care was taken to ensure that Ss should 
not communicate the correct solution to each 
other. They were individually solicited not to 
speak of the problem, and they promised not to 
do so. The E also looked for any signs in Ss 
during the problem-solving period which might 
indicate that the solutions were based on col- 
lusion. Furthermore no noticeable difference 
in the number of solutions was found toward the 
end of the testing period. 

Procedure.—The Ss were given 40 min. for 
the problem, including instructions. They were 
informed of the time limit, and if an S wanted 
to give up before this was up, he was encouraged 
to continue until the time had expired. The 
instructions were read aloud. The £ recorded, 
partly in code and as accurately as possible, the 
overt and verbal behavior of Ss. When Ss 
asked questions, the answer, whenever possible, 
consisted of a repetition of the relevant part of 
the instructions. 

Design.—The Ss were distributed randomly 
over Situation A and Situation B, 44 to each of 
the two situations. For each situation two 
groups were used, The No-Direction group 
(Group ND) was given the three Parts and 
treated as Maier’s Group III with the modifi- 
cations to be noted below. The Direction group 
(Group D) was given Parts and Direction. The 
Ss were randomized in an equal number into 
these two groups. There were five women in 
each of the four groups. 

Criterion for solution.—More than one solu- 
tion to a problem may invalidate any con- 
clusions to be drawn from an experiment on 
problem solving. For example, if there are five 
solutions to a problem, if all five have an equal 
chance of occurring, and if all Ss are able to 
perform all five solutions, one would expect 
about 20% solutions to the problem. Maier’s 
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conclusion about the presence of the necessary 
“parts” therefore presupposes that the criterion 
for the solution scored as correct is clear to all Ss. 

As criterion for solution Maier stated: “A 
good, firm construction would, of course, be 
the best” (2, p. 118). This is a vague criterion, 
and students argue that there are more than one 
solution satisfying the criterion. Weaver and 
Madden (5) have called attention to one solution 
where the longest pole is not pressed against the 
ceiling, but held in position horizontally by the 
table clamp used to combine the two vertically 
placed poles. It is also possible to make steady 
tripods, or steady constructions from the table. 
Furthermore it is possible to combine the longest 
and the shortest poles by the tableclamp and 
wedge them between floor and ceiling over one 
crossmark and then combine the two middlemost 
poles by part of one of the wires and wedge these 
between floor and ceiling over the other cross- 
mark. Finally, various of these devices for 
constructions can be combined. 

One might perhaps think that giving Ss 
unlimited time, as done by Maier, would ascer- 
tain that Ss perform all types of solutions they 
are capable of. Since it is not reasonable to 
assume that the resourcefulness of an S will 
remain stable over a longer period of time, this 
procedure does not remove the previous objec- 
tion. Also important is the fact that the 
“presence” of the necessary “principles” for the 
solution of the problem may also be interfered 
with when an S has started along what is 
considered a wrong attack on the problem. 

The criterion for the solution being vague, it 
is important to know what type of instructions 
are given to Ss arriving at or absorbed for a 
longer period of time in types of solutions not 
scored as correct. Maier has not stated his 
procedure on this point. In the repetition of the 
experiment the following procedure was adopted. 
When S had found a solution not scored as 
correct, he was told: “This is a good solution. 
I would, however, like you to find another one 
where you utilize the three principles I demon- 
strated to you.” Since the criterion for a good, 
firm construction is vague this procedure may 
be liable to a certain arbitrariness. As a check, 
the number of times this instruction is given in 
the different groups is indicated in reporting the 
results below. 

In addition to the vague formulation of the 
criterion for correct solution, there are points in 
Maier’s instructions that appear to be ambigu- 
ous. Weaver and Madden (5) have called 
attention to the fact that the introduction of 
Direction may be misleading. The instructions 
also contain a reference to the table which may 
be misleading. The instructions were, however, 
not changed on any of these points. 
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The vagueness of the criterion for correct 
solution may to some extent be reduced if it is 
clear to S that the correct solution involves the 
use of the three principles demonstrated in 
Parts A, B, and C. Maier instructed his Groups 
III and V to use these principles. In translating 
this instruction the last point was found to be 
confusing and was left out (“You do not have 
to use them [the principles], but only by using 
them will you get the most satisfactory solution. 
So try to use them” [2, p.121]). It is still clear 
from the preceding sentences that the three 
principles are to be used. Since it appears to 
be essential to Maier’s conclusion that it is clear 
to S that the correct solution must utilize the 
principles, it cannot be considered a violation 
of requirements to a repetition of the experiment. 
A point about Maier’s interest being in the 
qualitative side in the beginning of the instruc- 
tions was also left out since this might also be 
misleading. 


Results 


Table 1 gives the number of solu- 
tions for the Direction and No- 
Direction groups in Situations A and 
B. Inspection of Table 1 shows that 
none of the differences attributable to 
Direction is statistically significant. 
A x analysis (Yates’ correction) 
showed that none of the differences 
attributable to Situation A vs. Situ- 
ation B is statistically significant. 

Of the Ss who solved the problem, 
two in Group ND, Situation A; one 
in Group D, Situation A; and one in 
Group D, Situation B were asked to 
find a new solution after having found 
one not considered correct. This 
additional instruction cannot there- 
fore have affected the results ap- 
preciably. 

In order to supplement the data 


TABLE 1 


Number or Soivtions sy Grours is Expr. I 





1O- . 
N Direction 


Situation Direction 


A 7 
B Il 


Both 
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TABLE 2 


Distaisution or “Activities” or Farinc Ss 
in Exp. I 
(Only activities of Ss having failed the 
problem are included) 





Situation A Situation B 


i 
Group Group | 
ND Db 


' 

| 
Activity 
(Use of) . . 
Group | Group 
ND | Db 
Walls | 3 10 7 | 9 29 
Ceiling| 4(3)*| 6(4)*| 5(1)*! 3(0)*| 18(8)* 
Table 5 2 | 9 | 6 22 
Floor 5 4 0 l 13 





All 20 22 | 19 82 





* Frequency of Weaver and Madden (5) solution 
involving ceiling. 


presented in Table 1, the various 
types of “‘activities” exhibited in the 
four groups during the problem ses- 
sion were tabulated (Table 2). Since 
Ss not solving the problem are the 
only ones observed during the same 
period of time, Table 2 contains only 
their activities. 


“Activity” is classified as any attempt lasting 
2 min. or more at solving the problem in one of 
the following four ways: by using the walls, the 
ceiling, the table, or the floor. In many cases an 
activity would comprise the use of more than one 
of the objects indicated in the table. An 
activity might, ¢.g., involve the combined use 
of both wall and ceiling or of both table and floor. 
In the classification the 
‘ bser ved . 


following rules are 
When more than one object is in- 
volved the objects are given priority in the 
following order: (a) wall, (b) ceiling, (c) table, 
(d) floor. ‘Thus, if an activity involved the use 
of both ceiling and wall, it is classified under 
“use of wall,” while an activity comprising the 
use of both table and floor, is classified under 
“use of table.” If two types of activities 
occurred at the same time both activities are 
classified. An S may, e¢.g., construct one 
pendulum from the table and one between 
ceiling and floor. ‘This is then classified under 
“use of table” and also under “use of ceilinz.” 
If S made more than one construction involving 
the same object, it is only tabulated once. 

Previously is noted a solution described by 
Weaver and Madden. The number of times 
this specific construction was performed in the 
different indicated by 
under “use of ceiling.” 


groups is parentheses 
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In evaluating the possible effect of the ceiling 
it is most reasonable to include the solution 
described by Weaver and Madden (5). It will 
be seen that if this solution is considered correct, 
the arithmetic difference in favor of Situation B 
is reduced, 23 to 22, when this solution is in- 
cluded, as against 22 to 15 when excluded. 

Table 2 does not warrant detailed analysis 
because the criteria for scoring the activities are 
rather loose, the numbers in each category are 
small, and the activities are probably inter- 
dependent. Some tentative conclusions might, 
however, be drawn. 

In “use of walls” there is in Situation A a 
difference (x*, P = .O1) between Group D and 
Group ND, the first group using the walls more 
frequently. This finding suggests that Direc- 
tion has the effect of turning the attention of 
some Ss away from the ceiling. It will be re- 
membered that these Ss were also instructed to 
use Part C. If the ceiling is left out of con- 
sideration, the only two surfaces left for the 
wedging involved in Part C are the two walls. 
Weaver and Madden’s (5) results for their 
Group IV (given only Direction) suggested that 
the introduction of Direction may lead some Ss 
to concentrate on the floor. Bearing in mind 
the differences in treatment of Weaver and 
Madden’s group and this Direction group, there 
is no necessary conflict between their results 
and these. 

In “use of floor” there is a significant differ- 
ence (x*, P = .0O1) between Situation A and 
Situation B. Assuming this is not explained by 
interdependence between the various activities, 
the difference indicates that attention is more 
strongly focused on the floor in Situation A than 
in Situation B 

In “use of table” there is a significant differ- 
ence (x*, P = 01) between Situation A and 
Situation B. Again, under the assumption that 
the difference is not explained by interdepend- 
ence between the various categories nor by a 
greater amount of over-all activity in Situation 
B as compared to Situation A, the difference 
indicates that the miniature table in Situation 
B is more in the focus of attention than the full 
sized table in Situation A. 


’ 


By means of the protocols it was possible to 
test an hypothesis that had occurred to FE during 
the collection of the data. With regard to the 
material, the specific piece of in- 
formation would have to be acquired by S before 
he could solve the problem in the correct way: 
he would have to notice that the distance be- 
tween the crossmarks was exactly the same as 
the length of the longest pole. It might be 
thought that some Ss did not acquire this 
information during the problem session, and: for 
this reason did not solve the problem. 


following 


Assuming 





ANALYSIS OF MAIER’S PENDULUM PROBLEM 


this to be so, one would expect more Ss of those 
who solved than those who did not solve the 
problem, to compare the length of the pole with 
the distance between the two crossmarks, 
especially as this is the only way to ascertain 
the information. The protocols do not reveal 
any significant difference in this respect between 
those who solved and those who did not solve 
the problem, the figures for Situation A being 
12 (15 Ss) against 25 (29 Ss). 


Discussion 


The results indicate that Direction 
has no effect on the number of solutions, 
nor would the use of the ceiling appear to 
constitute a major source of difficulty. 
The groups are relatively small, but lend 
themselves to one and the same inter- 
pretation. If it is assumed that the use 
of the ceiling is not unavailable to Ss 
from this population, it is not to be 
expected that Direction will have any 
effect. It is reasonable to assume that, 
for some Ss, the introduction of Direction 
will mainly have the effect of giving 
added emphasis to the use of the ceiling. 
(For Ss not having the three principles 
available, the effect of Direction may of 
course be a different one, as suggested in 
the previous section under the analysis 
of activities carried out by Ss failing to 
solve the problem.) 

In evaluating the results it should also 
be noted that Direction does not have 
any effect either in the full-sized hallway 
or in the miniature. Likewise it should 
be noted that the ceiling has no effect 
either on the No-Direction groups or the 
Direction groups. The conclusion, with 
regard to Direction as well as to the 
effect of the ceiling, is thus substantiated 
by results achieved by two independent 
groups. 

Maier obtained only one solution in 
his No-Direction group (Group III); 
whereas, in the present study seven 
solutions were obtained in Group ND in 
the full-sized hallway. As will be under- 
stood this No-Direction group differs in 
important respects from that of Maier. 
There may be cultural differences and 
differences of intelligence, and further, 
as noted under Method, the No-Direc- 
tion Group of this experiment was in 
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some respects treated differently from 
that recorded by Maier. 


ExperiMent II 


The logical analysis of Maier’s experi- 
mental report revealed that his con- 
clusion rested on at least three important 
untested assumptions. Since the use of 
the ceiling did not appear to constitute 
a major source of difficulty for Ss, it was 
considered feasible to ignore Assumption 
a and turn to Assumptions 4 and c. 
Assumption #4 states that the demon- 
strations of the Parts were conceived by 
Ss as intended by Maier, and Assump 
tion ¢ that the statement of the problem 
was clear to §. As an investigation into 
the problems involved in Assumption 4 
appeared to be more inviting from a 
theoretical point of view, this was chosen 
for further study. It was decided to 
make an attempt at ascertaining the 
tenability of Assumption ¢ by changes in 
the procedure. 

The protocols of Exp. I suggested that 
not all Ss had acquired an adequate 
understanding of Part C, the demon- 
stration in the doorway. During the 
problem ‘session some Ss had concen- 
trated on the doorway. This indicated 
that they did not conceive of the 
demonstration of Part C as involving a 
general principle. Furthermore many 
Ss, when asked about Part C after the 
problem-solving session, did not seem 
to have understood this demonstration, 
as required by Maier’s conclusion. 

The simplest and probably best check 
on whether a certain principle is available 
to S, is to set him a task where the 
principle is to be used to achieve a 
solution. It was found that the demon 
stration of each of the three Parts could 
easily be changed into a task to be pre- 
sented to Ss. Accordingly, instead of 
demonstrating the three Parts, the Ss 
were set three definite tasks involving 
the same constructions as each of the 
three demonstrations. 


Method 


Instructions and procedur:. 
the instructions for the 


In formulating 


three tasks Maier’s 
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original formulations were adhered to when 
possible. For Ss who were set the three tasks, 
the following instructions were given: 

“You will be given a number of problems to 
solve: some of these may be easy and some a 
little more difficult. [Task 1] You know what 
a plumb line is like? [A plumb line was ex- 
plained if unfamiliar to S.] Your first problem 
consists in making a plumb line with the help 
of these objects. [S is given pencil, string, and 
burette clamp. ] 

“(Task 2. S is seated on a chair and a box of 
matches placed on the floor at a fixed distance 
from him.] We would like to see if you can get 
hold of this box of matches. You are not 
allowed to move from the chair. You have 
these objects at your disposal [S is given two 
poles and a clamp j. 

“(Task 3. S is taken to doorway.] Suppose 
you were to fasten a lantern screen here [E 
points to doorway]. The screen should be 
fastened so that it is kept taut up and down, 
and on a level with these four marks [EZ points 
to four marks—two on either side of doorway ]. 
You need only demonstrate how you would 
fasten the screen on one side. At your disposal 
you have these objects [S is given two poles]. 
You may use this saw for sawing the poles [S 
is given saw], but you may not use it for any 
other purpose. The poles may only be sawed 
across at right angles. [The distance between 
the two paint marks on either side of the door- 
way was 40 cm. and the lengths of the two poles 
were 50 and 120 cm. ]” 

When instructions for the three Tasks had 
been given, the problem was stated as in the 
previous report. In Maier’s original design 
there followed the instruction to use the three 
principles. ‘This instruction had to be rewritten 
in order to conform with the instructions 
governing the three Tasks. The following 
formulation was used. (Again Maier’s in- 
structions were adhered to when possible): 
“The three problems you solved just now each 
contain a principle. If you combine these 
principles in the right manner, you will get the 
best solution to the problem. Try to use them: 
they are the solution in three separate parts. I 
will briefly repeat the three problems. You 
combined a burette clamp, a pencil, and a string 
to obtain a plumb line. This was to show you 
how it is possible to combine certain objects and 
so get the qualities we desire. 

“In the second problem you were able to get 
hold of a box of matches by joining two poles 
together to make a longer one. Thus you see 


how we can make one long pole out of two shorter 


ones. 
“Then you were asked to fasten a screen. 
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This you did by placing one pole flat against the 
wall of the doorway, thus keeping your cloth 
taut up and down. Then you placed the other 
pole at right angles to it, and in this way wedged 
it in the doorway. Thus we could do without 
hammer and nails.” 

The instructions to use the three principles 
were only given to Ss solving the three Tasks. 
To other Ss only the statement of the problem 
was given. 

In Exp. I were mentioned certain points in 
the instructions where possible ambiguities 
could arise. The impression gained from ob- 
serving the behavior of Ss in Exp. I was that 
these ambiguities, apart from that involved in 
the statement of the criterion for solution, were 
not a serious hindrance for the attainment of the 
solution. However, if S performed a solution 
not scored as correct, he was asked to find 
another solution where the principles were 
utilized. But as compared to Exp. I the 
criterion for steadiness was loosened so that this 
instruction could be introduced more frequently. 
The Ss not solving the three tasks were also 
asked for new solutions, but as no reference had 
previously been made to the three principles, 
these were not referred to. 

The Ss were allotted 5 min. for each of the 
three Tasks and 40 min. for the pendulum 
problem. 

Subjects.—The Ss were 60 male students in 
their last year of high school. The average age 
was 19. Only students majoring in mathematics 
and physics were asked to volunteer. These 
students have had a very similar program of 
instruction all through school. The students 
were each paid three Norwegian crowns for 
participation, and were asked individually not 
to talk of the experiment. All Ss came from 
schools in the city of Oslo. 

Design.—The students were randomized 30 
to each of two groups. One group (Group T) 
was presented with the principles as tasks and 
otherwise treated as described above. The other 
group (Group ND’) was treated in the same way 
as the No-Direction group of Exp. I. 


Results and Discussion 


The Ss solving Tasks 1, 2, and 3 in 
Group T are designated Group T,, 
and those not solving are designated 
Group T;. Table 3 presents the 
number of solutions for the three 
groups: ND’, T,, and T,. All Ss of 
Group T solved Tasks l and 2. The 
difference in the number of solutions 
between Groups T,, and ND’ is 
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TABLE 3 
Numper or Soiutions in Expr. II 





Number of 
Solutions 


Total Ss 


30 
28 
2 


a 








significant beyond the 5% level (x?). 
The number of times Ss of Group 
T,, and of Group ND’ were asked 
for new solutions was 15 and 20, 
respectively. ‘This instruction is thus 
given less often to Group T, and 
cannot have affected this group more 
favorably than Group ND’. 


The superiority of Group T, over 
Group ND’ might be attributed to 
various factors: (a) greater familiarity 
with the materials in terms of time spent 
with it, (4) greater emphasis on the 
utilization of the three “principles,” (c) 
more appropriate state of motivation, 
(d) better understanding of the “prin- 
ciples.” 

To estimate the effect of Factor a the 
total time spent on the three Tasks and 
the problem can be added up for every 
S§ solving the problem in Group T,. If 
this total does not exceed 40 min. these 
Ss have not been given more time on the 
material than available to Ss of Group 
ND’. The protocols show that, apart 
from some occasional fumbling, Tasks 1 
and 2 are solved almost instantly by all 
Ss. If the time spent on Task 3 and the 
problem is added up individually for 
every § solving the problem in Group 
T,, only one § is found to have spent 
more than 40 min. This S spent 2 min. 
and 15 sec. in excess of the 40 min., 
which is not appreciably more than the 
Ss in Group ND’ would spend on the 
demonstration of Part C. The effect of 
familiarity with material may thus 
fairly safely be ruled out as a decisive 
factor. 

The effect of Factor 4 is not as easy 
to estimate. On general grounds it is 
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improbable that the demonstration of the 
three “principles” should not emphasize, 
just as strongly as the presenting of the 
three principles as tasks, that they were 
to be utilized for the correct solution. 
This reasoning cannot of course rule out 
this factor. 

Factor c is difficult to assess since it is 
difficult to know what is an optimal state 
of motivation for problem solving. It is 
probably one that is neither too high, 
nor too low. The impression of the 
observers was that the two groups of Ss 
were all well motivated. It wili be 
remembered that they were paid for 
taking part in the tests. On general 
grounds one would not expect the solu- 
tion of three such simple tasks to affect 
the state of motivation for Ss of this age. 
Factor ¢ cannot, however, be ruled out 
completely. Since Factors 4 and ¢ are 
not eliminated, d alone is not decisive. 

The behavior of some Ss in Group Ty, 
indicated that Task 3 was not a suffi- 
ciently good test to ascertain the under- 
standing of “Principle C.” Just as in 
Exp. I some Ss seemed to concentrate on 
the door, where Task 3 had been per- 
formed. This suggests that to them 
Task 3 did not embody a general 
principle. Questioning of Ss in Group 
T, who failed to solve the problem, 
showed that some of them maintained 
they had utilized the three “principles” 
in some constructions where E could not 
recognize it. From a logical point of 
view, the analysis of “Principle C” may 
be divided into two stages: (a) the 
pressing of one pole against a surface to 
keep something taut, and (4) the keeping 
of a pole in a fixed position by pressing it 
between two surfaces. Our questioning 
of Ss indicated that some Ss conceived 
of “Principle C” as containing only the 
second stage. 


Experiment III 


Experiment III was designed with a 
view to improving the test for the 
understanding of “Principle C”’ and with 
a view to checking Factors dandc¢. By 
introducing a few minor changes in Task 
3 it was hoped to obtain a group of Ss 
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who could be assumed to have an under- 
standing of “Principle C” and to be equal 
to Group T, with regard to Factors 6 
and ¢. 


Method 


Procedure.—The only modification was with 
regard to Task 3. Instead of instructing S to 
find a way of setting up a screen in the doorway, 
he was placed in the middle of the room and 
instructed as follows: “You must have seen a 
flag hanging down limply when there is no wind. 
Suppose you were to hang up a flag in this room, 
about 150 em. above the floor LE held his hand 
out at about that height]. On one side the flag 
has to be kept taut, while the rest of it must be 
allowed to hang freely. The side of the flag is 
supposed to be this long [EZ indicates 33 cm. or 
a meterstick ]. You have these objects at your 
disposal [S is given two poles]. This saw you 
may use to saw the poles [S is given saw ], but 
you may not use the saw for other purposes. 
The poles may only be sawed across at right 
angles.” The instruction was repeated and 
elaborated until it seemed clear to S. The 
lengths of the poles were 50 and 120 cm., 
respectively. The modified Task 3 (designated 
Task 3’) was to be performed in the doorway, 
in the same way as Task 3. 

Subjects.—The Ss were 62 high school 
students. ‘They were | yr. younger than Ss in 
Exp. II; in other respects they were the same. 
To an experimental group (Group T’), treated 
as described above, were randomized 44 Ss 
(70%) and to a control group (Group ND”) 
18 Ss (30%) treated as Group ND’. For Task 
3’ the Ss were allotted 15 min. 


Results 


Group T”’ is split into two groups, 


T’,, containing Ss solving 
Task 3’ and Group T’;, containing 
Ss not solving Task 3’. Table 4 
gives the number of solutions for the 
three groups: ND”, T’, and T’;. 
When Group T’, is compared with 
Group T, of Exp. II by x? test, the 
difference is found to be significant at 
the 5% level. The number of times 
Ss of Groups T’,, T's, and ND” were 
given the instruction to find a new 
solution were 6, 18, and 8 times. 
This instruction is given relatively 
less often to Group T’, than to 


Group 
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TABLE 4 


Numper or Sorutions in Exp. III 





Number of 
Solutions 


5 
15 
0 


20 


Group Total Ss 











Group T, of Exp. II and cannot there- 
fore have affected the former more 
favorably than the latter. 

The only person who failed to solve 
the problem in Group T’, produced 
the construction described by Weaver 
and Madden (5). Since an S who 
has carried out this particular con- 
struction cannot readily be expected 
to change it into the correct one when 
asked for a new solution, as he is 
liable to start working on an entirely 
different type of construction, it is a 
debatable point whether his perform- 
ance should be included among the 
solutions or not. Since the criteria 
for correct solution had been stated 
so as to involve the three “principles,” 
his performance is not scored as 
correct. 

A comparison of the number of 
solutions in Group ND” with the 
number in Group ND’ reveals no 
difference. Group ND’ and Group 
ND” are, therefore, combined into a 
Group ND’*’’. A comparison of the 
number of solutions in Group T’ with 
the number in Group ND’*”’ reveals 
no statistical difference. 


Discussion 


Task 3’ is different from Task 3. 
Nevertheless it is unlikely that the 
solution of Tasks 1, 2, and 3’ would 
emphasize more strongly that the “‘prin- 
ciples” were to be used than would the 
solution of Tasks 1, 2, and 3. The fact 
that nearly 100% of the Ss in Group 
T’, solved the problem also indicates 
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that it was clear to Ss that the three 
“principles” were to be utilized for the 
correct solution. Therefore Factor 4, as 
stated in the discussion of the results of 
Exp. II, cannot account for the su- 
periority of Group T’, over Group Ty. 

Likewise, it is unlikely that the solu- 
tion of Tasks 1, 2, and 3’ would produce 
a more appropriate state of motivation 
than would the solution of Tasks 1, 2, 
and 3. Furthermore, it seems reasonable 
to expect that the Ss of Group T’, 
would solve the problem in less time than 
the Ss of Group T, if they were moti- 
vated in a more appropriate manner, 
particularly since Group T’, might be 
considered more select than Group Ty). 
(As will be remembered, a smaller 
percentage solved Task 3’ than Task 3.) 
The protocols reveal that the Ss of 
Group T’, spent on an average 19 min. 
49 sec. on the problem, as against 17 
min. 40 sec. in Group T,. Therefore, it 
seems reasonable to conclude that the 
superiority of Group T’, over Group T, 
is not attributable to Factor c. 

In Exp. II it was found that the results 
could not be interpreted in terms of 
greater familiarity with the material. 
Since Ss were given more time on Task 
3’ than on Task 3 these results must also 
be analyzed with regard to Factor a 
(familiarity). By the same type of 
reasoning as that used in Exp. I it is 
found that only three Ss of Group T’, 
spent more than a total of 40 min. on 
the material. Of these three Ss two 
spent less than 2 min. more. Therefore, 
familiarity with the material can hardly 
be considered a factor differentiating 
between the two groups. 

The results of the experiment seem to 
warrant the conclusion that the under- 
standing of “Principle C”” must be con- 
sidered a decisive condition in accounting 


for the results of Maier’s original experi- 


ment. A reasonable interpretation of 
the results of the three experiments 
that the 
solutions in the different groups increases 
with degree of availability of the “prin- 
ciples.” Tasks 1, 2, and 3’ (Exp. IIT) 
is probably the sharpest test for avail- 


seems to be 


percentage of 
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ability and among those passing this 
test there are the largest percentage of 
Ss having the three “principles” avail- 
able and thus solving the problem. 
Tasks 1, 2, and 3 constitute the next 
sharpest test of availability and among 
those passing this test there are a smaller 
percentage having the “‘principles’’ avail- 
able and thus solving the problem 
than in the preceding group. The mere 
demonstration of the “principles’’ pro- 
vides the least adequate control over 
availability and among the Ss_ being 
subjected to this procedure there are the 
smallest percentage having the “‘prin- 
ciples” available and thus solving the 
problem. 

The fact that there is no statistical 
difference between Group T’ and Group 
ND’?t” in the number of solutions, might 
indicate that the pendulum problem and 
Task 3’ were identical in the sense that 
not only did a group of Ss solving Task 
3’ also solve the pendulum problem, but 
a group of Ss solving the pendulum 
problem would solve Task 3’. This 
conclusion is probable, but would not 
be warranted unless it could be shown 
that the Ss of Group T’: did not solve 
the pendulum problem after the three 
“principles” had been demonstrated to 
them. As will be remembered, these 
Ss received no hint that could make 
“Principle C” available. Furthermore, 
they did not receive the instruction that 
the correct solution required the utili- 
zation of the three “principles.” 


Discussion 


The results of Exp. III give added 
evidence for the conclusion arrived at in 
Exp. I about the effect of Direction and 
of the ceiling. For Ss having solved 
Tasks 1, 2, and 3’ (Group T’,) no 
specific Direction seems to be necessary, 
since they all with one exception solved 
the pendulum problem. Likewise the 
use of the ceiling seems to be available 
to all Ss having solved the three Tasks. 
The results of all three experiments 
consistently support the conclusion that 
Direction as introduced by Maier is of 
no effect and likewise that the use of the 
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ceiling, as hypothesized by the present 
writer, does not constitute a major 
difficulty. It should further be noted 
that even if Direction, contrary to these 
results, should be found to have an 
effect, Maier’s conclusion about the 
insufficiency of past experiences will not 
bevalid. Experiments II and II ]indicate 
that his untested assumption, that the 
three demonstrations were conceived of 
as intended by him, is not valid. 

The high proportion of solutions 
(almost 100%) in Group T’, (the Ss 
having solved Tasks 1, 2, and 3’) may 
permit a more far-reaching conclusion 
than the one expressed above, namely: 
An S§ drawn from the same population 
as the Ss of these experiments will solve 
the pendulum problem if he has an 
adequate understanding of the three 
“principles” as stated by Maier. The 
combination of these three “parts” thus 
does not seem to involve difficulties for 
Ss and lead to failure in solving the 
problem. In view of the relatively small 
groups employed and the crude measures 
involved, this conclusion must be tenta- 
tive. However, the results of two 
previous experiments (3, 4) suggest that 
close to 100% solutions will be obtained 
in the group having available what may 
be considered the necessary “principles.” 

At this point it cannot of course be 
argued that Part C was a test of ability 
to combine the three “‘parts” (since this 
would lead to the conclusion that the 
ability to combine the three “parts” 
might be adequately tested on a single 
one of these ‘“parts”). One might, 
however, argue that Part C could consist 
of smaller “parts’’ and that Task 3’, 
therefore, still can be considered a test 
of S’s ability to combine some sort of 
“parts.”” In whatever way this is 
viewed, the results clearly restrict the 
range over which an ability to combine 
“parts” operates. The present results 
provide no evidence that any type of 
ability to “combine parts” affects the 
number of solutions to the Maier 
pendulum. 

As for the old issue whether problem 
solving should be considered part of 
“learning” or a distinct problem area, 
the present results suggest that it is part 
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of “learning.” At the present level of 
analysis it seems possible to account for 
problem solving with few additional 
statements, if an adequate learning 
theory were available. Harlow (1) 
centers his analysis of problem solving 
around what he has called “the organized 
response pattern.” The present results 
may be regarded as evidence of the 
fruitfulness of this point of view. The 
present analysis seems to leave no room 
for Gestalt-oriented views on problem 
solving as hitherto expressed. Terms 
like “re-structuring,” “‘re-centering,” and 
“direction” seem unnecessary in order to 
account for the results. This does not, 
of course, mean that there is not some 
sort of “set” or “Einstellung”’ operating 
in problem solving. On the contrary, 
these are key terms in studies of problem- 
solving behavior. Nor does it mean that 
there is no rdle for Gestalt principles in 
accounting for problem solving. What is 
concluded is that “set’”’ or “‘Einstellung,” 
or Gestalt principles do not operate at 
the level of analysis used in interpreting 
the results presented here and earlier 
(3, 4). 


SUMMARY 


Maier’s pendulum problem is described and a 
logical analysis presented of certain points in 
his design. Experiment I is then designed to 
test the hypothesis that the difficulty of the 
problem to most Ss hinges around the use of the 
ceiling for the correct solution. The experiment 
is repeated under two sets of conditions: (a) in a 
hallway having approximately the same dimen- 
sions as that used by Maier; (b) in a miniature 
model of the hallway. The assumption is that 
the ceiling is available to more Ss in the latter 
situation. Under both sets of experimental 
conditions a Direction group and a No-Direction 
group were used, The results give no support 
to Maier’s conclusion that “Direction” has an 
effect, nor to the hypothesis that the difficulty 
of the problem hinges around the use of the 
ceiling. 

Experiment II was designed to achieve a 
better control than in the original experiment 
on whether or not the “principles” demonstrated 
by Maier had been understood. Instead of 
being demonstrated to S, the “principles” are 
presented as definite tasks to be performed. 
The results show that a group of Ss performing 
adequately on the three tasks produced more 
solutions to the problem than a group of Ss to 
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whom the principles had merely been demon- 
strated. Some possible interpretations of these 
results are discussed. 

Experiment III involved a still better control 
over Ss’ understanding of the “principles,” here 
the third preliminary task was reformulated. 
The results reveal that close to 100% of the Ss 
who performed adequately on Task 3 solved the 
problem. 

It is concluded that the understanding of the 
“principles” is a decisive condition in accounting 
for the results on Maier’s pendulum probiem. 


REFERENCES 


1. Hartow, H. F. Thinking. In H. Helson 
(Ed.), Theoretical foundations of psy- 


chology. New York: Nostrand, 1951, 
Pp. 452-505. 


. Mater, N. R. F. Reasoning in humans: I. 


On direction. J. comp. Psychol., 1930, 
10, 115-143. 


. Savestap, P. Problem-solving as dependent 


on availability of functions. Brit. J. 
Psychol., 1955, 46, 191-198. 


. Sauestap, P. Problem solving and avail- 


ability of functions. Act. Psychol., (in 
press). 


. Weaver, H. E., & Mappen, FE. H. “Di- 


rection” in problem solving. J. Psychol., 


1949, 27, 331-345. 


(Received July 20, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 3, 1957 


SERIAL EFFECTS IN RECALL OF UNORGANIZED AND 
SEQUENTIALLY ORGANIZED VERBAL MATERIAL! 


JAMES DEESE AND ROGER A. KAUFMAN # 
The Johns Hopkins University 


One of the best established gen- 
eralizations in the study of verbal 
learning is found in the serial position 
effect for the learning of homogeneous, 
discrete verbal items by the method 
of serial anticipation. Some form of 
the classical serial position curve is 
found for a considerable variety of 
verbal material and conditions of 
testing; the essential restriction seems 
to be that the learning and/or recall 
be by the method of serial anticipation 
or some modification of it (5). Sev- 
eral studies (4, 8) show that there is 
quite a different form of the serial 
position effect when free recall is the 
method of testing employed. With 
the method of free recall, the middle 
items are less frequently recalled, the 
first items are moderately well re- 
called, and the last items are most 
frequently recalled. Thus, the serial 
position curve for serial anticipation 
and that for free recall are roughly 
mirror images of one another. This, 
of course, is a qualitative comparison, 
since the exact form of the curves 
will depend upon the material and 
method of testing. 

Recently, Bousfield, 
Silva (2) have noted that in free 
recall the order in which items are 
recalled depends upon their proba- 
bility of being recalled. Items fre- 
quently recalled by everyone are apt 
to be recalled first by particular indi- 
viduals. Since, in free the 


Cohen, and 


recall, 


''The research reported in this paper was in 
part supported by funds from the National 
Science Foundation grant NSF—61369. 

*Now at the University of California, 
Berkeley. 


last items are most frequently re- 
called, it suggests, for homogeneous 
material, that the last items should be 
recalled first and the middle items 
last. It is clear, however, that free 
recall of ordinary English textual 
material does not happen in this way 
(so perhaps it is not really free recall). 
We ordinarily do not recall the last 
words of a passage of prose first and 
the middle words last. In general, 
we recall the first words first and the 
last words last. Thus, our recall of 
ordinary prose approximates the order 
of recall forced by the method of serial 
anticipation.* It seems an obvidus 
conclusion that this is because the 
sequential nature of ordinary prose 
forces such sequential recall. 

The purpose of the experiments 
reported here is to examine the in- 
fluence of sequential structure in 
verbal material upon the order of 
recall of individual items and upon 
the serial position curve for frequency 
of recall. A formal hypothesis may 
be stated as follows: As increasing 
sequential structure is introduced into 
material given Ss with instructions 
for free recall, the order of recall and 
the serial position curve for frequency 
of recall will change from that char- 
acteristic of free recall of unstructured 
material to that characteristic of the 
learning or recall of unstructured ma- 
terial by the method of serial antici- 
pation. This implies that the serial 
position curve for a passage of ordi- 
nary prose in free recall, all other 
things equal, should be like that for 
nonsense material in serial antici- 
pation. 
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In Exp. I, the characteristics of free 
recall for serially unstructured ma- 
terial and for connected prose are 
compared. In Exp. II the effects of 
variation in sequential structure is 
examined by analyzing the changes in 
patterns of recall in certain of the 
pseudosentences employed by Miller 
and Selfridge (3) in their study of the 
effect of sequential structure on 
amount retained in immediate free 
recall. 

EXPERIMENT | 


Method.—Male undergraduate students at 
the Johns Hopkins University were used as Ss 
in this study. ‘Two groups of 16 Ss each were 
presented with lists of words drawn randomly 
from the Thorndike-Lorge word list (7). A 
third group of 27 Ss was presented with passages 
of connected discourse made by altering selec- 
tions obtained from the 1953 World Almanac. 
Presentation was oral, and all Ss were instructed 
to try to remember what they heard. A test 
for recall was obtained immediately after each 
list or read. For the lists of 
randomly selected words, a verbatim transcrip- 
tion of recall was taken by FE. For the passages 
of connected discourse, a dictophone recorded 
S’s recall and the recording was tran- 
scribed. A tape recorder was used to present 


passage was 


later 


the passages of connected discourse; the same 
recording was used throughout. The lists of 
random words were read orally by E (two Es 
tested an equal number of Ss each) at the rate 
of 1 word per sec. Both Es used in this and 
subsequent phases of the study were instructed 
and trained to read the material at the appro- 
priate rate without emphasis or inflection. 

One group of 16 Ss recalled 10 lists of words 
10 items in length. A second group of 16 Ss 
recalled 10 lists of words 32 items in length. 
The order of words in the lists was randomly 
scrambled for each S and each S was presented 
the lists in a different order. 

The passages of connected discourse were on 
three different topics, and each of these three 
topics was presented to 27 Ss. The topics were 
respectively, “Montana,” “The Museum of 
Science and Industry,” and “Bonneville Dam.” 
Each passage was approximately 100 words in 
length, consisted of 10 fimple statements organ- 
ized into sentences and clauses. The statements 
were such that, by minor rewording, they could 
be presented in different orders. This is im- 
portant to the problem under study, since 


individual words and phrases in connected 


181 


discourse differ greatly in their ease of being 
recalled, and in order to study the effects of 
serial order on recall it is necessary to randomize 
as well as possible the location of individual 
items. Consequently, nine different orders of 
statements in each passage were used. Two 
examples of arrangements of the passage, “The 
Museum of Science and Industry,” are given 
below: 

1. “The Museum of Science and Industry 
is devoted to exhibits of scientific and industrial 
processes. And it will always remain a picture 
of modern civilization. Many mechanical dis- 
plays can be operated by the visitor, and for 
many years the most popular display has been 
an operating coal Perhaps the most 
unusual display is a room given over to an 


mine. 
operating radar center. Many other displays 
are devoted to things of historical interest, and 
as the techniques of industry change, the con 
The museum 
is located in South Chicago, and it was founded 
by Julius Rosenwald. It occupies the grounds 


of the old Columbian exposition.” 
? 


tents of the museum will change. 


“Many displays are devoted to objects 
of historical interest at the Museum of Science 
and Industry, which occupies the grounds of the 
old Columbian The Museum is 
devoted to exhibits of scientific and industrial 


exposition. 


processes, and perhaps the most unusual room 
is given over to an operating, radar center. The 
Museum is located in South Chicago. It will 
always remain a picture of modern civilization, 
for as the techniques of industry change, the 
contents of the Museum will change. For 
many years the most,popular display has been 
an operating coal Many mechanicai 
displayé can be operated by the visitor. The 
Museum was founded by Julius Rosenwald,” 
The word lists were scored by number of 
words per posityyn rec alled The 
passages were scored by number of statements 
per position recalled correctly; some latitude in 
exact wording was allowed in the scoring (for 
example, the of “exhibit” for 
“display”). There was, however, 100% agree 


mine. 


correctly 


substitution 


ment between two scorers on a sample of tran 
scriptions of recall from five Ss 


Results.—The curves in Fig. 1 show 


the mean frequency per list with 


which ,items in each position were 
recalled for all lists of each length. 
For both the 10-word lists and the 32- 
word lists the highest frequency of re- 


call occurs at the end of the list. The 
frequency of recall for initial items is 
relatively higher than in an earlier 
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Fic. 1. Mean frequency of recall per list 
per S for lists of randomly arranged words as a 
function of position of items in original lists. 


study (8), and this is probably because 
ordinary words rather than nonsense 
syllables were used in the present 
study. The mean order of recall of 
items as a function, of position is 
presented in Fig. 2. Comparing Fig. 


1 and 2 it appears that probability of 
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Fic. 2. Mean position of items in recall of 
randomly arranged words as a function of 
position of items in original lists. 


recall and order are correlated. In 
general the last items are recalled first 
and the middle items last. Estimates 
of the concordance between frequency 
of recall and position in recall were 
obtained by Kendall’s tau coefficient 
(6). The value of tau for the 10-word 
lists is .867 and tau for the 32-word 
list is .536. These are positive since 
the item at the first position of recall 
was assigned a rank of one and the 
item with the highest frequency of 
recall was assigned a rank of one, etc. 
Both obtained taus are significant 
beyond the 1% level. 

The serial position curve for the 
free recall of statements from the 
passages of connected discourse is 
different from those presented in Fig. 
1. Figure 3 shows that the serial 
position curve for the textual material 
is very much like the classical curve 
for the method of serial anticipation; 
the highest frequency of recall is at 
the beginning of the list, and the 
lowest frequency of recall is just past 
the middle. The result suggests that 
the association processes in learning 
by the method of serial anticipation 
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Fic. 3, Mean frequency of recall per passage 
per S for statements in textual passages as a 
function of position of statements in original 
passages. 
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may not be so different from the 
processes of ordinary language as has 
sometimes been supposed. The rank 
order of mean position in recall is 
perfectly correlated with the position 
of statements in the original passage 
(tau = 1.0). 

The results of this phase of the 
study suggest that the variation in 
serial position curves for free recall is 
limited on one hand by free recall of 
disconnected material and on the 
other by free recall of sequentially 
connected passages like those used in 
the present study. It remains to be 
shown, however, that the differences 
between these two types of curves 
reflect the higher sequential organi- 
zation in the passages of connected 
discourse. The purpose of Exp. II 
is to examine this question. 


Experiment II? 


Method.—Forty Johns Hopkins undergradu- 
ates were used as Ss. The material presented 
to Ss consisted of the 50-item lists of words used 
by Miller and Selfridge (3) as part of their study 
of the influence of sequential dependency upon 
the amount of immediate memory. 

The Miller and Selfridge lists are made in 
such a way as to approximate by degrees the 
statistical sequential structure of ordinary prose. 
The lists with zero-order dependency were ob- 
tained by randomly selecting words from the 
Thorndike-Lorge wordbook; therefore, these 
lists are very much like the ones used in the first 
experiment. The first-order dependency lists 
were obtained by scrambling words from the 
higher order lists (in an attempt to approximate 
the frequencies of usage with sequential de- 
pendencies). The second-order lists were ob- 
tained by giving an S two words and asking him 
to make a sentence of them. The word S used 
following the initial two was added to the two 
and the first word of the two was dropped. This 
new group of two was given to a new S with 

*Preliminary data on 30-word lists were 
obtained. These data are not presented here 
because of some irregularities in procedure. 
They may be found, however, in a master’s 
essay by the junior author on deposit in the 
Johns Hopkins University library. 
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instructions to make a sentence. Lists of words 
were then constructed by chaining together the 
successive words of new Ss. Third-, fourth-, 
fifth-, and seventh-order lists were made in 
exactly the same way, except that longer chains 
of words were given to Ss. In addition, Miller 
and Selfridge present lists of ordinary English 
prose, and their 50-word textual passage was 
used in the present experiment. 

Eight lists (seven orders of approximation 
and textual passage) were presented to all Ss. 
The order of words in the lists was, of course, 
fixed by the sequential dependencies in all except 
the zero-order list. The zero-order list was 
presented in a different, randomized order to 
each S. Randomization of order of presen- 
tation of the different lists was achieved by 
repeating five times an 8 X 8 latin square. The 
lists were read to S by E at the rate of | word per 
sec.; one E tested all Ss. The E requested recall 
immediately after the reading of a list, and all 
responses made by S, in order, were recorded. 
In the analysis of recall, responses extraneous to 
the list being recalled were ignored. 

Duplicate words appeared in all of the higher- 
order lists and the textual passage. Because the 
recall of such words would be ambiguous with 
respect to position, these words were not scored, 


Results.—Frequency of recall as a 
function of order of items in the lists 
is presented in Fig. 4. Frequency of 
recall has been averaged by groups of 
five items in order to minimize the 
influence of particular high association 
words at unique and unalterable 
positions. The greater smoothness 
in the recall curve for the zero-order 
list is attributable to the fact that 
individual words could be randomized 
by position in this list. 

There is an increase in the total 
frequency of recall with an increase 
in order of approximation, in con- 
firmation of Miller and Selfridge (3). 
The tau between frequency of recall 


for each order and order of approxi- 


mation is .857 (P < .01). Table 1 
yields information on the question of 
where, in the lists, the increase of 
frequency of recall with higher-order 
approximations occurs. Here are 
presented the taus for the separate 
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FIRST ORDER 








10 | 5 


FOURTH ORDER FIFTH ORDER 


MEAN FREQUENCY 








a a oe ee oe 


r THIRD ORDER 








ane aa SS S| 


10 | 5 10 1 5 10 





ORDER IN LIST (AVERAGE OF FIVE) 


SEVENTH ORDER 








aa oe a ee, = oe oe oe oe oe 





10 1 5 


io | 5 10 | 5 10 


ORDER IN LIST (AVERAGE OF FIVE) 


Fic. 4. 


Mean frequency of recall per S for items in lists of various degrees of sequential structure 


(orders of approximation) as a function of position of items in original lists (averaged by groups of 


five items). 


positions in the lists (averaged by 
groups of five). All of the obtained 
taus presented in Table | are signifi- 
cant at the 5% level or beyond except 
those for the last three groups of five 
items and that for the third group of 
five items. Therefore, it appears 
that the greater portion of the increase 
in frequency of recall in the higher- 
order approximations comes from the 
first two-thirds of the lists. 

There is also a change in the form 


of the serial position curves. The 
serial position curve for the zero-order 
approximation resembles the curves 
for free recall of random material 
presented in Exp. I. The curves for 
the higher-order approximations are 
apparently different, however. In 
order to obtain some estimate of the 
extent to which the shift in position 
of maximum recall is reliable, the tau 
statistic was used again. The per- 
centage of total recall scores found in 


TABLE 1 


Rank-Orper (tau) Corretations Between Frequency or RecaLt AND Orper or 
Approximation ror Successive 1lOrus or tHe 50-Worp List 





Position 
in List 


6-10 | 11-15 | 16-20 


a 


tau .714** | .786** 


| 26-30 31-35 %) 40 41-45 46 SO 


571° | .786%* | .429 sae 214 





*o<P < 0S. 
“P< O, 
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TABLE 2 


Rank-Orper (tau) Corretations Between Mean Position tn Recaut anv (1) Frequency or 
Recaui, anv (2) Orper or Oricrnar List, ron Each Ornper or Approximation 





Position and 
Frequency 


Position and 
List Order 











Order of Approximation 





3 


289 


89°" 




















* O01 <P <.05. 
“P< O1. 


items from the first half of the list was 
computed for each order of approxi- 
mation. ‘These percentages were cor- 
related with the order of approxi- 
mation, and this yielded a tau of .643 
(P = 01). Thus, for the _ higher- 
order approximations a larger per- 
centage of the total items recalled 
comes from the first half of the list 
than is the case for the lower-order 
approximations. 

In Exp. I it was shown for the free 
recall of random items that the order 
of recall was correlated with the 
frequency with which the individual 
items were recalled. For recall of the 
textual material, however, the order 
of recall was correlated with the order 
of presentation of items in the lists. 
The implication of these results for 
Exp. II are, (a) the order of recall 
(position of an item in recall) should 
be correlated with frequency of recall 
for the early orders of approximation, 
and (b) the order of recall should be 
correlated with the position in the 
original list for the higher-orders of 
approximation. 

Table 2 presents the tau coefficients 
for these two sets of correlations. 
The correlations between position in 
recall and frequency of recall are all 
positive (highest frequency and first 


position given rank one), though only 
those for the zero, first, second, third 
and fourth orders of approximation 
are significant beyond the 5% level. 
The correlations between position in 
recall and position in the original list 
are either very low or negative except 
for the seventh order of approxi- 
mation and the textual passage, both 
of which are highly significant. Thus, 
the seventh-order and textual pas- 
sages are recalled roughly in their 
order of presentation, while for the 
other passages the dominant relation- 
ship to order of recall is frequency of 
recall. 


Discussion 


The results of the present experiments 
show that the serial position curve of 


frequency of recall varies with the 
sequential structure of the material 
being recalled. The variation in serial 
position curves with changes in se- 
quential dependency is accompanied by 
variation in the order with which the 
items are emitted in recall. It is 
probable that the changes in serial 
position are dependent upon these 
changes in the order in which the items 
are emitted. 

For material which has little or no 
inherent sequential structure, items are 
emitted in a kind of primitive order of 
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strength during test for recall. In this 
connection, Bousfield, Cohen, and Silva 
(2) have pointed out the parallel to 
Marbe’s law. The present study shows 
that for unstructured material, in which 
the primitive order of strength deter- 
mines the order of recall, Ss emit the 
last items first on the average, the 
beginning words next and finally the 
words in the middle of the list. 

For material with high sequential 
structure the order of emission is 
principally determined by the order of 
items in the list, and furthermore, the 
serial position curve is much like that 
found in serial anticipation learning. 
Thus, as increasing sequential depend- 
ency is introduced into lists of words 
used in recall tests, Ss do more than 
simply organize the words into larger 
and larger chunks; they reorganize their 
patterns of verbal behavior to conform 
with long established habits of dealing 
with the sequentially structured material 
of ordinary language. 

The advantage gained in frequency of 
items recalled by the redundancy in 
higher-order lists seems largely to come 
from the beginning and middle of the 
lists. Itis not certain, however, whether 
this particular location of gain in recall 
is associated with redundancy per: se or 
whether it is because the recall for the 
higher-order lists is correlated with the 
order of presentation. If the same shift 
in order of items recalled occurred in the 
experiments of Miller and Selfridge (3) 
as occurred in the present experiment, 
then it is not certain that all of the gain 
in recall with higher-order lists is asso- 
ciated with the greater redundancy of 
these lists; it is possible that it may be 
associated with the change in order of 
emission. From data in existence at the 
present it is impossible to tell if this is 
the case or not; however, because in- 
creasing redundancy in the sequentially 
organized lists is associated with re- 
organization of the emission of items in 
recall, the effect of sequential structuring 
is more complicated than it appeared to 
be at first. In order to further examine 
this question it will be necessary to 
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compare the effects of different methods 
of introducing redundancy into the lists 
used in tests of recall. It is possible, 
for example, that. redundancy introduced 
by meaningful relationships of the sort 
exhibited in the study of “clustering” 
(1) will produce a different pattern of 
organization, and hence may enable us 
to separate the effects of redundancy 
and reorganized order of emission. 


SUMMARY 


The results of two experiments on the im- 
mediate recall of verbal material are presented. 
For lists of words in which there is no sequential 
association between adjacent words (randomly 
arranged lists), Ss recall the individual items in 
order of their probability of being recalled. For 
these kinds of lists, the last items are recalled 
most frequently, the first items next most 
frequently, and the middle items least fre- 
quently. For passages of connected discourse, 
the order of recall is in the order with which the 
material is presented, and the serial position 
curve of frequency of recall is like that obtained 
by the method of serial anticipation with non- 
sense material (and roughly the mirror image of 
that obtained with free recall of nonsense ma- 
terial). It is probable that the serial position 
curve obtained with textual material depends 
upon the serial order of emission during recall. 
It was demonstrated by the use of the orders of 
approximation to textual English devised by 
Miller and Selfridge (3) that increasing the 
sequential dependency from zero to that 
characteristic of textual English changes the 
order and frequency of recall from those char- 
acteristic of free recall of disconnected material 
to those characteristic of serial anticipation. 
Thus, recall of sequentially dependent material 
involves more than the organization of words 
into larger groups, it also involves the reorgani- 
zation of the patterns of emission of responses 
and changing the relative frequency with which 
items in various positions are recalled. 


REFERENCES 
1. Bousrretp, W. A., & Conen, B. H. The 


occurrence of clustering in the recall of 
randomly arranged words of different 
frequencies-of-usage. J. gen. Psychol., 
1955, 52, 83-95. 

2. Bousrietp, W. A., Conen, B. H.,:& Siva, 
J. G. The extension of Marbe’s law to 
the recall of stimulus-words. Amer. J. 
Psychol., 1956, 69, 429-433. 





SERIAL EFFECTS IN RECALL 


3. Miter, G. A., & Secrrivce, J. A. Verbal 
context and the recall of meaningful 


Amer. J. Psychol., 1950, 63, 


material. 
176-187. 
4. Rarret, G. Two determinants of the effect 
of primacy. Amer. J. Psychol., 1936, 48, 
654-657. 
5. Rospinson, E. S., & Brown, M. A. Effect of 
serial position upon memorization. 


Amer. J. Psychol., 1926, 37, 538-552. 


187 


6. Srecet,$. Nonparametric statistics. 
McGraw-Hill, 1956. 

7. Tuornpixe, E. L., & Lorce, I. The 
teacher’s word book of 30,000 words. 
N. Y.: Bureau of Publications, Teacher's 
Coll., Columbia Univer., 1944. 

8. Wercu, G. B., & Burnetr,C. T. Is primacy 
a factor in association-formation? Amer. 


J. Psychol., 1924, 35, 396-401. 
(Received July 30, 1956) 


N. Y.: 





Journal of Experimental Psychology 
Vol. 54, No. 3, 1957 . 


THE INTERACTION OF FREQUENCY, EMOTIONAL TONE, 
AND SET IN VISUAL RECOGNITION ! 


SAMUEL C, FULKERSON * 
The University of Texas 


Recent work on the perception of 
need-related stimuli has indicated 
that a number of independent vari- 
ables such as frequency, recency, 
background, set, stress conditions, 
and drive-strength can affect the 
threshold to tachistoscopically pre- 
sented stimuli. Three variables 
(emotional tone, frequency, and set) 
are of particular relevance in the area 
of “perceptual defense,” each having 
been advanced as the primary factor 
determining recognition thresholds for 
stimuli involving negative affect. 
McGinnies (4) obtained higher recog- 
nition thresholds for taboo words than 
for non-taboo words, attributing this 
to a tendency for the taboo words to 
arouse an avoidance response. 
Howes and Solomon (2) and Postman 
(6) have pointed out that frequency 
differences among the stimulus words 
could account for the findings in most 
“perceptual defense’ experiments. 
Postman, Bronson, and Gropper (7), 
and more recently Freeman (1), have 
shown that set differences can affect 
the recognition thresholds to taboo 
words. 

It is, therefore, of interest to in- 
vestigate the interaction between 


' This paper is based on part of a dissertation 
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versity of Texas in 1955 in partial fulfillment of 
the requirements for the Ph.D. degree. I wish 
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Dr. Wayne H. Holtzman, and the other members 
of the committee, Drs. Karl M. Dallenbach, 
Harry Helson, and Philip Worchel. 

? Now at the Department of Clinical Psy- 
chology, School of Aviation Medicine, USAF, 
Randolph Air Force Base, Texas. 


these three variables. There is reason 
to expect interaction effects. Post- 
man and Schneider (9) in a related 
area found that value differences were 
not related to threshold differences 
when the value words were all of high 
frequency. If a general explanation 
is to cover the perception of all 
classes of need-related stimuli, then 
this interaction might be expected to 
hold between tabooness and fre- 
quency. 

The present study is an attempt to 
answer two questions regarding the 
perception of taboo words: (a) what 
is the relative importance of the 
above-mentioned variables in deter- 
mining the recognition threshold; and 
(b) how do these variables interact? 

Frequency was defined in terms of 
the values in the Thorndike-Lorge 
tables, emotional tone in terms of 
ratings of tabooness, and set was 
manipulated by varying both instruc- 
tions and pattern of stimulus pres- 
entation. Investigators have limited 
themselves to the manipulation of 
instructional set in “perceptual de- 
fense’”’ studies. However, the ex- 
perimental task is characterized by 
the fact that several stimuli are 
presented in serial order. It is likely 
that recognition thresholds for a given 
class of stimuli will be high when the 
background effect, due to the sequence 
of presentation, is such that the 
expectation for that stimulus-class is 
low. In this experiment background 
was manipulated by varying the ratio 
of taboo to non-taboo words in the 
stimulus list. 


188 





VISUAL RECOGNITION 


MetTuop 


Subjects.—The Ss were 120 volunteer male 
undergraduates at the University of Texas. All 
were native born, possessed of normal vision, 
and naive as to the nature of the experiment. 

Assigning tabooness and frequency values.—A 
group of 47 male undergraduates at the Uni- 
versity of Texas was presented a list containing 
120 words, including 85 which were thought to 
be'socially unacceptable, and asked to rate the 
words on a five-point scale, ranging from 
“completely acceptable socially” through “com- 
pletely unacceptable socially.” The mean 
ratings were used to place the words in three 
categories of social acceptability or tabooness: 
non-taboo, mildly taboo, and highly taboo.’ 
Within each degree of tabooness three levels of 
frequency, as measured by the Thorndike-Lorge 
General Summary Count (11), were distin- 
guished: low, medium, and high frequency. 

Groups.—To vary background differences, 
three lists of 30 words were constructed, each 
containing a different ratio of taboo to non-taboo 
words. Each list was presented tachistoscopi 
cally to a different group of Ss as follows: Group 
A, 40 Ss were presented a list of words containing 
24 taboo words and 6 non-taboo words; Group 
B, 40 Ss were presented a list of words containing 
15 taboo words and 15 non-taboo words; Group 
C, 40 Ss were presented a list of words containing 
6 taboo words and 24 non-taboo words. ‘There 
were 10 words in the lists which were invariant 
from group to group, and the analysis of the 
data is based on the threshold values for these 
10 words. The 10 words represent nine cells, 
the values for the two medium-frequency, 
non-taboo words (delve and furl) being pooled 
to obtain an average threshold value for that 
cell. The words are shown in Table 1, with 
their frequency and tabooness values. It should 
be noted that one of the high taboo words, 
“prick,” has a non-taboo meaning. This 
weakness of design seemed unavoidable, since 
no unequivocally taboo word was given a high- 
frequency rating in the Thorndike-Lorge tables. 
The fact that, except for four Ss, “prick” was 
not presented until after several other taboo 
words, seemed reason to expect that it would be 
generally taken to have its taboo meaning. 

Procedure.—The words were typed in capital 
elite type on a continuous roll of white paper. 
This roll was placed in a Gerbrands Mirror 
Tachistoscope, and recognition-thresholds were 

3 A list of these taboo words, giving their mean 
ratings of tabooness, mean ratings of familiarity 
(not discussed in this study), and Thorndike 
Lorge General Summary Count values is 
available from the author. 


TABLE 1 


Tue Invargtant Stimutus-Worps with Tuer 
Assicnep Mean TaBooness AND 
Tuornpike-Lorce Frequency 
VALUES 





Category 

ory Mean Thorndike 
Tabooness Lorge 

Rating | Frequency 


i xperimental | 
Word 


Fre- 
quency 


| 
| 
| 
| 


Tabooness 


Clasp | High | Non-taboo 
Medium | Non-taboo 
Medium | Non-taboo 

| Low | Non-taboo 

Naked | High | Medium 

Belch Medium | Medium 

Crud | Low | Medium 

Prick | High High 

Whore High 


Medium 
Turd Low High 








Note. Mean tabooness ratings run from a possible 
low of 1.00 to high of 5.00. Thorndike-Lorge values 
are in terms of frequency per million words. A value 
of zero was assigned if the word did not appear in their 
list. 


obtained for each word by the ascending method 
of limits. ‘The word was first presented at an 
exposure time of .O1 sec. For the first 20 trials 
the time of exposure was increased by .01 sec. 
on each successive trial. For the next 20 trials 
an increase of .02 sec. was used. After that an 
increase of .05 sec. was employed. This non- 
linear series was adopted in order to equalize 
as much as possible differences in practice effects 
between Ss. The Ss who could not correctly 
identify all of the words at an expcsure speed of 
1 sec. were not used in the experiment. Before 
starting on the experimental list, three practice 
words were always presented to get Ss used to 
the apparatus and to level off practice effects. 
Within each of the three groups, the stimulus 
list for that group was presented in one order to 
20 Ss, and in the reverse order to the other 20 Ss. 
This was for the purpose of balancing out order 
effects. In addition, different starting points 
in the list were used to further insure that each 
of the 10 invariant words appeared equally often 
at the beginning, middle, or end of the list. 
There was one factor which could not be 
equalized between the three sequences, The 
mean serial position of the 10 invariant words 
differed from group to group. ‘The sequence of 
stimulus presentation for Group A was designed 
to maximize the expectation that the next word 
would be a taboo word. Therefore, no non 
taboo word appeared until from three to six 
taboo words had appeared. In Group C, on the 
other hand, an attempt was made to maximize 
the expectation that the next word would be a 
non-taboo word, and the six invariant taboo 
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words were, therefore, shifted toward the end 
of the presentation order, compared to sequences 
in Groups Aand B. We would expect, therefore, 
that practice effects would cause a tendency for 
the invariant taboo words to have lowest 
thresholds for Group C. This tendency works 
against the hypothesized direction. 

Instructional set was varied by telling half 
of the Ss within each of the three groups that 
they would be shown taboo words, and by not 
telling the other half anything about the nature 
of the stimuli. 


REsuLtTs 


The treatment of the data is in the 
form of an analysis of variance. ‘Two 
independent variables enter into the 
analysis: three groups, involving dif- 
ferent proportions of taboo words in 
the stimulus list; and two levels of 
instructional set. There are also two 
variables which represent repeated 
measures on the Ss: three levels of 
tabooness and three levels of fre- 
quency. ‘Table 2 shows this analysis 
of variance. Not all terms are listed, 
since some of the interactions were 
felt to have no clear meaning, even if 
significant. As a supplement to this 
analysis, the mean values for each of 
the independent cells are shown in 


TABLE 2 


ANALYSIS OF VARIANCE OF 
RecocnitTion-THREesHoLp 
Scores 





Group (G) 
Instructions (1) 
GxIl 

Residual between Ss 


| 


w 


Frequency (F) 
Tabooness (T) 
FX T 


FXG 


x 
4 

x 

xFXT 
Residual within Ss 


Srne + & rh an 


$ 














* .O1 level of significance. 
** 001 level of significance. 
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TABLE 3 


Ceti Means ror THe INDEPENDENT VARIABLES 





Group B | Group C 
Told 1 | % | 101 


Instructions | Group A 





Not told | (90 | 105 | 103 
#0 | 1.00 | 1.02 





Total 














Table 3 and the mean values for the 
correlated variables are shown in 
Table 4. The data entering into 
these tables are scores which repre- 
sent a logarithmic transformation of 
the original recognition-thresholds. 
Each threshold value was expressed 
in hundredths of a second, and this 
value was then transformed to its 
logarithm. This transformation was 
necessary since Bartlett’s test for 
homogeneity of variance indicated 
heterogeneity of variance beyond the 
.O1 level of significance. The trans- 
formation did not completely rectify 
this heterogeneity, which after trans- 
formation was still significant at the 
05 level, and to take into account the 
fact that the variance differences were 
likely to inflate some of the F values, 
a conservative level of significance 
(the .O1 level) was adopted. 

Group differences.—The differences 
between the groups getting the three 
stimulus lists were significant at the 
Ol level. The greater the number of 
taboo words in the list, the lower the 
threshold value. Since the primary 
purpose of varying the ratio of taboo 
to non-taboo words was to test the 
effect on the thresholds for the taboo 
words, the important source of vari- 
ance is that representing the inter- 
action between Group and Tabooness. 
This interaction is significant beyond 
the .OO1 level. The results are most 
clear-cut when the extremes of group 
treatment and tabooness are ex- 
amined. In Group A the difference 
between the means for the high-taboo 
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TABLE 4 


Cett Means ror THe CorreLaTep VARIABLES 

















High 
Medium 


Low 


Total 























Total 























words and non-taboo words gives a 
t of 2.07, which is significant beyond 
the .05 level. That is, for the group 
in which the expectation of taboo 
words is maximized, “‘perceptual vigi- 
lance” occurs. In Group C the 
difference between the means for the 
high-taboo words and _ non-taboo 
words gives a t of .43, which is not 
significant, although the thresholds 
for the high-taboo words are higher 
than for the non-taboo words, indi- 
cating a tendency toward “perceptual 
defense.”” While the tendency to- 
ward “perceptual vigilance” is so 
strong in Group A that it causes a 
slight over-all tendency in the same 
direction, Table 4 indicates that in 
only four of the nine possible com- 
parisons is there a tendency toward 
“perceptual vigilance” and the tend- 
ency holds primarily among the low- 
frequency words. 

Instruction effects—There were no 
significant differences due to the 
different instructions given to Ss. 
There was a slight but insignificant 
tendency for the mean threshold 
values to be lower for the 60 Ss who 
were told to expect taboo words. A 
separate analysis was carried out 
using ouly the first taboo words to 
appear, since it was felt the effect of 
instructional set on thresholds would 
be maximal for the first taboo word. 


This analysis also gave insignificant 
results. The Frequency X Instruc- 
tions interaction seemed due to the 
fact that frequency had little effect 
on threshold for those Ss told to 
expect the presentation of taboo 
words, but among those Ss who were 
not told what to expect the mean 
threshold for the low-frequency words 
was significantly greater than for the 
high-frequency words. The effect of 
frequency on thresholds was in the 
same direction for both instruction 
groups. 

Frequency effects.—-There were sig- 
nificant differences in threshold values 
between the three frequency levels. 
The higher the frequency of the word, 
the lower its threshold. The lack of 
interaction between group and fre- 
quency indicates that background 
differences due to varying the pro- 
portion of taboo words did not affect 
the influence of frequency on 
thresholds. 

Tabooness effects.—There were no 
significant over-all threshold differ- 
ences between the three levels of 
tabooness. However, the effect of 
tabooness is confounded by the fact 
that both group and frequency deter- 
mine the effect the tabooness variable 
will have. The Group X Tabooness 
interaction has already been de- 
scribed. The Frequency K Taboo- 
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ness interaction was 
beyond the .OO1 level. An exami- 
nation of the means of the cells 
involved in the interaction indicates 
that for the high-frequency word, the 
higher the tabooness the higher the 
recognition-thresholds—that is, there 
is a tendency toward “perceptual 
defense” within the high-frequency 
words. For the medium-frequency 
words there is no trend. For the 
low-frequency words the higher the 
tabooness the lower the recognition- 
thresholds—that is, “perceptual vigi- 
lance.” From the knowledge that 
there is also a Group X Tabooness 
interaction it would be deduced, as 
an examination of Table 4 verifies, 
that the tendency toward perceptual 
defense in the high-frequency words 
is most marked in Group C, and the 
tendency toward perceptual vigilance 
in the low-frequency words is most 
marked in Group A. 


significant 


Discussion 


Two of the significant interactions in 
this study can be related to results in 
other areas of motivated perception. 
The tendency toward “perceptual vigi- 
lance” occurred within the low-frequency 
words but not within the high-frequehcy 
words, and this is comparable to the 
Postman and Schneider (9) results for 
value words. The emotional tone of the 
taboo words affected recognition thresh- 
old as a function of the proportion of 
taboo words in the stimulus list, the 
threshold for taboo words being raised 
when the expectation of taboo words was 
a minimum. Postman and Crutchfield 
(8) presented words with letters missing 
in such a way that either a food or non- 
food word could be made. They con- 
cluded, “The perceived value of a given 
stimulus-object depends upon the num- 
ber and magnitudes of similar objects 
which have preceded it and which deter- 
mine S’s adaptation-level” (8, p. 211). 
For the purposes of this experiment their 
formulation could be paraphrased to read 
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that the recognition-threshold of a given 
stimulus depends upon the number of 
similar stimuli which have preceded it. 
All three independent variables seemed 
to affect the recognition thresholds sig- 
nificantly. However, it is necessary to 
ask whether the effects which have been 
attributed to tabooness can be reduced 
to frequency differences. The argument 
for reduction might run like this: it 
appears that there is an increasing 
tendency toward “perceptual defense” 
going from low- to high-frequency words. 
Postman, Bronson, and Gropper (7) 
have pointed out that the frequency of 
taboo words is likely to be underesti- 
mated by the Thorndike-Lorge frequency 
tables. All else being equal, this would 
tend to produce lower thresholds for the 
taboo words than for the non-taboo 
words; i.e., “perceptual vigilance.” 
However, the classical relationship be- 
tween frequency and associative strength 
is in the form of a negatively accelerated 
curve such that, beyond a certain point, 
increases in frequency cannot appre- 
ciably affect associative strength because 
it has approached its upper limit. This 
suggests that a ceiling effect is in- 
creasingly masking the influence of 
frequency differences on the recognition 
thresholds as higher frequency words are 
used. This explains why, within the 
high-frequency words used in this study 
where the influence of frequency is at a 
minimum, there is no significant differ- 
ence between taboo and _ non-taboo 
thresholds. The slightly higher thresh- 
olds for taboo words can be explained by 
noting that practically never in this 
experiment was a taboo word given as a 
prerecognition guess. Solomon’ and 
Postman (10) have shown that pre- 
recognition guesses tend to be high- 
frequency words. This means that the 
high-frequency non-taboo words were 
high in the hierarchy of probable pre- 
recognition guesses, while the taboo 
words were not. This would tend to 
produce lower thresholds for the non- 
taboo words, all else being equal. The 
striking tendency toward “perceptual 
vigilance” among the low-frequency 
words can be attributed to the greater 
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familiarity of taboo words and also may 
be related to the finding of Lawrence and 
Coles (3), who demonstrated that lower 
thresholds occur when the stimulus is 
one of a known set of limited alternatives. 
Since taboo words represent a relatively 
small and distinct class of meaning, once 
§ learns to anticipate taboo words, this 
narrows the range of probable responses. 

Two facts can be arrayed which sug- 
gest that the tabooness of the words 
determined the response above and 
beyond frequency and set. (a) The 
tendency for Ss to avoid taboo words as 
prerecognition guesses indicates that the 
tabooness of the words was interfering 
with the response. While this might 
have resulted from a repressive mecha- 
nism, such as is implied by the theoretical 
concept of “perceptual defense,” spon- 
taneous remarks of Ss suggest that it was 
more likely to have been the result of 
deliberate suppression. (4) Background 
differences from group to group did not 
result in significant changes for the non- 
taboo words but did for the taboo words. 
If background differences affected ex- 
pectation irrespective of tabooness, we 
would expect that the thresholds for the 
non-taboo words for Group C would be 
lowered as much, relative to the taboo 
words, as the taboo word thresholds were 
lowered compared to the non-taboo 
thresholds in Group A. But background 
differences had no effect on the threshold 
values for the non-taboo words. In 
order for background to make a differ- 
ence, there must be discrimination of 
the dimension that is being varied. It 
seems reasonable that non-tabooness is 
not as clear a meaning class as tabooness. 

The possibility must also be considered 
that the effects attributed to both 
tabooness and frequency are due to 
differences in word configuration. In 
the selection of the stimulus words an 
attempt was made to equate the words 
on several dimensions of configurational 
similarity. Nevertheless, the factor of 
configuration is not ruled out, particu- 


larly since there is only one word per 


cell in all but one of the cells. However, 
this factor has been discounted because 
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of the fact that an earlier pilot study, 
which is not reported here, gave essen- 
tially the same results using mostly 
different words. 

The defense that has been made for 
considering the dimension of tabooness 
independent of frequency has not been 
based on the same conception of the 
effect of tabooness held by McGinnies 
(4), who regarded the effect of taboo 
words as due to the fact that they 
aroused an avoidance response. In the 
argument which has been presented in 
this paper the effects of tabooness have 
been attributed to the fact that tabooness 
may be regarded as the affective aspect 
of the meaning of the stimulus words. 
The explanatory use of “meaning” is 
believed to be similar to the use Postman 
(6) has made of the concept of ““empha- 
sis.” The word “meaning” has been 
chosen over “emphasis” because of a 
bias toward the position held by Wood 
worth who said, “A theory held by the 
present writer regards perception as a 
kind of response, and places much 
emphasis on the central factors of set 
and meaning” (13, p. 624). 

It is interesting to note that the 
attempt to reduce findings in “perceptual 
defense” studies to frequency differences 
is similar toa controversy with which 
meaning theories have had to wrestle. 
Underwood (12) maintained that quan 
titatively meaningfulness and familiarity 
were not operationally different. How- 
ever, Noble (5) was able to demonstrate 
a nonlinear relationship between meaning 
and familiarity, when meaning was de 
fined in terms of the number of asso- 
ciations made to a word in a stipulated 
unit of time. The present discussion 
has attempted to explain “perceptual 
defense”’ in terms of set, frequency, and 
meaning, making the same differentiation 
which Noble has made between fre- 
quency and meaning, but in a different 
area, 

SUMMARY 


Words differing in tabooness and frequency 
were tachistoscopically presented under two 
conditions of instructional set and three back- 


ground conditions to 120 undergraduate males. 
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An interaction between tabooness and back- 
ground was found. The mean threshold for 
non-taboo words was not affected significantly 
by differences in the proportion of taboo words 
in the stimulus list, but the mean threshold for 
the taboo words was low when the proportion of 
taboo words was high, and high when the pro- 
portion of taboo words was low. 

There was also an interaction between 
tabooness and frequency, such that there was a 
consistent trend toward “perceptual defense” 
among high-frequency words, and a trend 
toward “perceptual vigilance” among the low- 
frequency words. 

An explanation of these results was made in 
terms of set, frequency, and meaning. 
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SIMPLE REACTION TIME AS A FUNCTION OF 
TIME UNCERTAINTY! 


EDMUND T. KLEMMER 
Operational Applications Laboratory, Air Force Cambridge Research Center 


An earlier study (3) showed that 
simple reaction time (RT) varies 
with S’s uncertainty about time of 
stimulus occurrence. This time un- 
certainty is a function of both the 
mean duration of the time (fore- 
period) between a warning signal and 
the stimulus and the _ variability 
within the series of foreperiods. 
Foreperiod variability adds uncer- 
tainty directly and mean foreperiod 
is important since S’s ability to 


predict time of stimulus occurrence is 
very much a function of the length of 
time he must predict. 

In the previous report the two 
time uncertainty were 
separately because no 


sources of 
considered 
single measure was available. The 
present experiment illustrates with 
new data, a method by which all of 
S’s time uncertainty can be expressed 
as a single number and reaction time 
plotted as a single-valued function of 
time uncertainty. In addition, time 
uncertainty is expressed in terms of 
the information measure. 

In order to estimate the amount of 
time uncertainty due to S’s imperfect 
time-keeping ability, it is necessary 
to run time-interval prediction tests 
with intervals equal to the mean 
foreperiods of the reaction time tests. 
The variance of the distribution of 
each S’s times of response in the time 
prediction test is taken as a measure 
of his “subjective” time uncertainty 


!This research was performed at the Oper- 
ational Applications Laboratory, Air Force 
Cambridge Research Center, Bolling Air Force 
Base, Washington 25, D. C., in support of 
Project 7682. This is AFCRC TR 56-1. 


for intervals equal to the predicted 
interval. Total time uncertainty is 
obtained by adding this measure of 
subjective time uncertainty to the 
variance of the distribution of fore- 
periods in the RT test having a mean 
foreperiod equal to the time interval 
used in the prediction test. This 
total variance can be converted to a 
nondimensional informational meas- 
ure of uncertainty which makes 
possible a comparison of the present 
results with RT tests involving a 
choice reaction. 

The present study, then, consists 
of two separate tests series, one RT 
and one time prediction, given to the 
same Ss. The results of the two 
series are combined in such a way 
that a single valued plot of RT as a 
function of time uncertainty is de- 
rived for each S. 


Metuop 


RT apparatus.—The apparatus consisted of 
an NE 51 neon stimulus bulb, a response key, 
and a warning click device. The .02-sec. 
stimulus was clearly visible in the dimly lighted 
room. In another room a teletype tape pro- 
grammer presented 11 different lengths of fore- 
period in a random order as described below. 
The warning click occurred regularly every 10 
sec. Each RT was measured to the nearest 
millisecond by a Berkeley Universal counter and 
timer, Model 5510, and printed out auto- 
matically. 

RT procedure.—Ten different RT tests were 
used, each with a different mean foreperiod and/ 
or foreperiod variability. These tests are de- 
scribed in Table 1. Each S took single runs on 
each of the 10 tests in reversing order 1 —+ 10, 
10 — 1 until he had taken one practice run and 
five experimental runs on each test. Three Ss 
began with Test 10 and two began with Test 1. 
Each run consisted of 51 stimulus presentations. 
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TABLE 1 


Descuirtion or Reaction Time Tests 





SEE se = oF 


Foreperiod Characteristics (Sec.) 


Mean | SD 


0.5 





Note.—-The variable foreperiods were chosen ran- 
domly from controlled frequencies in the shape of a 
normal distribution having a range of + 2.2 SD. 











The Ss were always informed of the range of 
foreperiods before each run, and in addition, the 
first three trials in the test demonstrated the 
range. The first foreperiod of the test was 
alwavs the longest for that test, the second 
toreperiod the shortest, and the third foreperiod 
the mean foreperiod for that test. These first 
three RT's were omitted from the analysis. ‘The 
remaining 48 foreperiods were randomly ordered 
from the normal distribution of foreperiods 
described in Table 1. Note that this is a change 
from the previous study in which the frequency 
distribution of foreperiods was rectangular. In 
the constant foreperiod tests, 51 constant fore- 
periods were used in each run and the first 3 
RT’s were omitted from the analysis. 

In all tests, S was instructed to respond as 
soon as possible after the stimulus light, but 
never before. If a response occurred before the 
stimulus in any run, the run was halted and 
started over. This method produced con- 
siderably less than 1% anticipations, none of 
which occur in the runs reported here. 

Prediction apparatus.—The prediction ap- 
paratus used a warning click every 10 sec. and 
a response key similar to the RT apparatus. 
No stimulus light was used, however. Instead, 
a small light would flash at the instant S pressed 
the key. This light varied in position according 
to how long after the warning click the key was 
pressed. If the key was pressed at exactly the 
instructed interval after the click, the light 
would appear on an index mark; if pressed too 
soon, the light appeared to the left of this mark; 
if pressed too late, to the right. The distance 
in each case was proportional to the error and so 
S received immediate knowledge of direction and 
magnitude of error after each response. The 
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interval between each click and prediction 
response was recorded from a Standard Electric 
timer. 

Prediction procedure.—Tests were given using 
five prediction intervals: 4, 1, 2, 4, and 8 sec. 
Each S took single runs of 50 stimuli each on 
each test in reversing order 4 — 8, 8 — 4, until 
he had completed one practice and five experi- 
mental runs on each test: Three Ss began with 
the 8-sec. test and two began with the }-sec. 
test. The Ss were always informed of the 
correct interval before each test and given at 
least four “warm-up” predictions followed 
without interruption by the 50 clicks, 10 sec. 
apart for the test run. 

In all tests, S was instructed to make a 
prediction after each click and attempt to make 
the light appear as close as possible to the index 
mark. 

Subjects.—The Ss two university 
students and three laboratory personnel. 
Subjects K, B, and C had previous RT training; 
subjects G and W did not. All Ss took the RT 
tests before the time prediction tests. 


were 


ReEsuLts AND Discussion 


The total uncertainty of time of 
stimulus occurrence was computed 
for each S separately for each of the 
10 RT tests. The assumption made 
is that Ss uncertainty about the time 
of stimulus occurrence in any test may 
be estimated by adding the variance 
of the distribution of actual fore- 
periods to the variance of his own 
predictions of time intervals equal to 
the mean foreperiod. Reaction-time 
Tests 1-5 have no foreperiod vari- 
ability so that all of the time un- 
certainty is accounted for by this 
estimate. Tests 6-10 have contri- 
butions from both foreperiod vari- 
ability and S’s subjective time un- 
certainty. Test 10 uses a mean 
foreperiod of 5 sec. for which pre- 
diction data was not taken but the 
SD of prediction is a nearly linear 
function of prediction interval in this 
region, and so the variance for 5-sec. 
prediction is obtained by interpolating 
SD’s between 4 and 8 sec. 

The total time uncertainties were 
converted to SD’s and these SD’s used 
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or is S’s time un- 


certainty, given as an SD, and found by adding foreperiod variance to the variance of S’s estimates 


of intervals equal to the mean foreperiod. 


Hr is time uncertainty in bits relative to a constant 1-sec. 


foreperiod. Straight lines are fitted by least squares. 


as the time uncertainty dimension of 
the plots in Fig. 1. Time uncertainty 
(or) appears on a log scale along the 
abscissa. Mean RT is plotted on a 
linear scale along the ordinate. The 
filled dots represent RT Tests 1-5, 
reading from left to right, and the 
open dots represent Tests 6-10, also 
reading from left to right. Each 
point represents the mean of 5 runs 
of 48 stimuli each 


Time uncertainty is expressed in 
terms of the informational measure 
(bits) along the upper ordinate scale 
of Fig. 1. Since time is a continuous 
variable, the stimulus uncertainty 
measure must be relative to some 
standard distribution? (2). Shan- 


2 Information transmission values are absolute 
rather than relative, even when based upon 
continuous distributions. ‘Transmission is dis- 
cussed later. 
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non’s formulas (4) assume that the 
standard distribution is a uniform 
distribution over one unit of the 
variable, but for the present data it 
has seemed better to take as the 
standard each S’s own time uncer- 
tainty for a constant foreperiod of 1 
sec. Thus the zero value of infor- 
mational uncertainty is placed 
directly over the second filled dot 
corresponding to Test 2 which uses 
a constant foreperiod of 1 sec. The 
other points along the scale of pre- 
sented information are found in the 
following manner. The SD, ar, 
which is taken as the measure of total 
time uncertainty is assumed to be 
based on a normal distribution since 
it is the result of adding variances 
from foreperiod and time-prediction 
distributions which are approximately 
normal in shape. The uncertainty 
in bits to be associated with a nor- 
mally distributed random variable is 
given by Shannon (4). 


H(x) = log V2xe o 


If the Shannon formula were used 


(1) 


directly with or in seconds, the 
uncertainty values obtained would be 
relative to a uniform distribution over 
1 sec. In order to make the infor- 
mational uncertainty measure relative 
to the distribution of estimates of a 
l-sec. interval, it is necessary to use 
an arbitrary unit of time for meas- 
uring o. These new time units will 
be a linear function of or, so that we 
may write: 


Hr = logkar + logV2xe = (2) 


in which or is still in seconds, but Hr 
is relative to the desired standard 
distribution if //7 = O when or = a, 
where o, is the SD of S’s estimates of 
a l-sec. interval. By substitution: 


0 = logko; + logV2ne, (3) 


Hr = loger — logo. 
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For uncertainty in bits relative to a 
constant l-sec. foreperiod : 


(4) 


This equation is used to compute the 
upper abscissa scale in Fig. 1. The 
negative informational uncertainties 
to the left of zero represent less time 
uncertainty than the standard one- 
second foreperiod. 

The straight lines in Fig. 1 are 
fitted to the points by the least mean 
square method. The line for the 
“All Subj.” plot is fitted to the total 
array of points as plotted in the 
individual graphs. For the sake of 
clarity, the individual S points are not 
repeated in the combined plot. The 
zero point of the informational scale 
on the combined plot is based on the 
average variance of response times in 
the one-second prediction tests. 

Product-moment correlation _ be- 
tween mean RT and time uncertainty 
in bits (or in log or) varies from .956 
to .983 among the five Ss. When the 
data from all Ss are pooled, the cor- 
relation drops to .905 because of the 
large difference in slope of regression 
line among the Ss. This slope varies 
from 12 msec. per bit to 24 msec. per 
bit over Ss. The pooled data shows a 
slope of 18 msec. per bit with an 
equation in terms of time uncertainty 
given as an SD of: 


Hr = logsor - logo. 


RT = .018 logywor + .235, (5) 
where or = os’ + cr’, or’ is fore- 
period variance, and og’ is variance 
of S’s own estimates of interval equal 
to mean foreperiod, with all values in 
seconds. 

An analysis of variance of the data 
from each S separately showed no 
significant variance due to deviations 
from linear regression between RT 
and time uncertainty. This finding, 
together with the high linear cor- 





SIMPLE REACTION TIME 


relations, suggest that RT is a linear 
function of time uncertainty in the 
range of this study. 

Several investigators have studied the 
relation between RT and stimulus un- 
certainty in situations in which there was 
uncertainty about which of several 
stimuli will be presented. In general, 
they have also found a linear relation 
between RT and stimulus uncertainty 
in bits, but the slopes have varied widely 
(1). No slope, however, has been as 
small as the 12-24 msec. per bit found 
in the present study. The difference in 
slope are due to such things as stimulus- 
response compatibility and dimension- 
ality. More work needs to be done in 
these areas. 

The next question that arises con- 
cerning time uncertainty in informational 
terms is how much information about 
time of stimulus occurrence S actually 
transmits.2 Transmitted information 
may be approximated by the difference 
between S’s prestimulus uncertainty 
about when the stimulus will occur and 
the residual uncertainty about time of 
occurrence as estimated from his re- 
sponse. If we neglect the slight relation 
between foreperiod and RT, and assume 
normal distributions, this calculation 
involves only taking the logarithm of the 
ratio of the time uncertainty expressed 
as an SD (er) and the SD of the corre- 
sponding RT distribution. The con- 
stant factor which makes the information 
measure relative, drops out in this ratio 
so that transmitted information is an 
absolute score. The equations for this 
approximation are given below. 

As before, S’s average informational 
uncertainty about when each stimulus 
will occur is given by: 


Hr = logzr — logs. (6) 


By the method shown above, the average 
informational uncertainty about time of 


* Tests 1-5 present no information in clock 
time to be transmitted but do involve con- 
siderable subjective time uncertainty which is 
reduced by stimulus occurrence. ‘Transmission 
has been measured in terms of this reduction in 
uncertainty. 
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stimulus occurrence based upon knowl- 
edge of the response time is given by: 


Her = logeorr — log. (7) 


Transmitted information is given by: 


T = Hr — Har = \ogwr/orr. (8) 


The above method of measuring trans- 
mitted information is different from a 
straightforward application of the usual 
transmission formulas which would con- 
sider uncertainties in clock time only. 
For tests with considerable clock-time 
uncertainty in the stimulus, the two 
methods give essentially the same results, 
but for any test with constant foreperiod, 
the direct application of transmission 
formulas would show zero transmission. 
The inclusion of subjective time un- 
certainty in the present study gives a 
more accurate picture of the actual 
informational demands on the human 
operator. 

In the present data oy varies over a 
20-to-1 range over tests while the SD 
of the RT distributions varies within 
only a 2-to-1 range. This means that 
time uncertainty of the stimulus deter- 
mines the slope of the RT versus in- 
formation transmitted functions which, 
therefore, are very similar to the RT 
vs. log or functions as plotted in Fig. 1. 
Of more interest is the absolute value of 
information transmitted. The peak 
transmission occurred in Test 10 and 
varies very little over Ss: 5.37 to 5.60 
bits per stimulus. The highest ratio 
between information transmission and 
RT results from a transmission of 5.49 
bits with a RT of 222 msec. in Test 10. 
The lowest transmission occurs in Test 1 
where S has little uncertainty about time 
of stimulus onset. The smallest ratio 
here involves a transmission of .86 bits 
with a 157-msec. RT. Interestingly, 
the lowest and highest ratios were both 
achieved by the same 8. 

Note that the information transmis- 
sion values cannot be compared to the 
stimulus time uncertainty measured in 
bits. The transmission scores are abso- 
lute, but the stimulus uncertainty meas- 
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ure is relative to an arbitrary standard 
distribution. 


SUMMARY 


Five Ss were given a set of simple RT tests 
specifically designed to test the hypothesis that 
a single-valued relation could be obtained 
between RT and the time uncertainty of the 
stimulus. This relation was shown to be ap- 
proximately linear when time uncertainty is 
plotted as an informational measure. The 
slope of the RT-time uncertainty function 
averaged 18 msec, per bit of stimulus uncer- 
tainty which is less than the slope arising from 
RT experiments involving choice among several 
stimuli previously reported. Information trans- 
mitted in the time domain varied from less than 
one to more than five bits per stimulus over the 
10 tests. 


. Bricker, P. D. 


. Suannon, C. E. 
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EFFECTS OF DELAY OF INFORMATION FEEDBACK AND 
TASK COMPLEXITY ON THE IDENTIFICATION 
OF CONCEPTS! 


LYLE E. BOURNE, JR. 


University of Utah 


Underwood (14) has suggested that 
time, with particular reference to the 
contiguity of stimuli and appropriate 
responses, is perhaps the most im- 
portant variable in problem-solving 
behavior. He assumes that, in order 
for relationships between stimuli to be 
perceived and acquired, correct re- 
sponses to those stimuli must be 
contiguous. This assumption may be 
interpreted to yield the prediction 
that delaying information to S about 
the correctness of his response should 
inhibit performance in a complex 
discrimination task such as concept 
identification. Delay of information 
feedback would reduce the likelihood 
of stimulus-correct response conti- 


guity, since the greater the delay 
interval the more S must rely on 


memory to maintain the stimulus 
across the time gap. 

The effect of delay of information 
feedback or reinforcement on per- 
formance has received much attention 
from learning theorists. <A _ large 
number of well-controlled studies 
(e.g., 6, 9, 10) have appeared in the 
literature. The efforts of most in- 
vestigators, however, have thus far 
been exclusively directed toward the 
learning of lower animals. Before 
generalizations from these data to 
human behavior are justified, some 

1A condensation of a doctoral dissertation 
which is on file in the Library of the University 
of Wisconsin. The author is indebted to 
Professors E. J. Archer, W. J. Brogden, and H. 
Leibowitz under whose direction the dissertation 
was prepared. The research was supported in 
part by a grant from the Research Committee 
from funds provided by the Wisconsin Alumni 
Research Foundation. 
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experimentation with the delay vari- 
able should be carried out with human 
Ss. Recently, Perkins, Banks, and 
Calvin (11) reported a study of the 
effects of delay on the performance of 
children on simultaneous 
cessive discrimination 
These Es used only two delay 
intervals, 0 and 5 sec. Although 
delay was not a significant source of 
variance in the analysis, its effect was 
in the predicted direction. There 
was no significant delay by problem 
interaction. 

The purpose of the present experi- 
ment is to investigate the effect of 
delay of information feedback and of 
task complexity, graduated in terms 
of the amount of irrelevant infor- 
mation within the stimuli of the 
problem, on performance in.a concept 
identification task. The effect of 
task complexity has recently been 
investigated by Archer, Bourne, and 
Brown (1). The results of this study 
indicated that performance degrades 
as a positively accelerated function of 
the amount of irrelevant information. 
The interaction of complexity and 
delay will be of particular interest. 
A statistically significant interaction 
would indicate a nonadditive rela- 
tionship between the variables and 
imply that the effect on human 
problem-solving behavior of delaying 
information feedback depends in a 
nonlinear fashion on the complexity 
of the problem. 


and suc- 
problems. 


PROCEDURE 


The Ss were 162 
psychology courses. 


Subjects. 
elementary 


in 
was 


students 
Each 
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assigned randomly to one of 18 treatment 
combinations and served individually for one 
session. Each S was presented with detailed 
tape-recorded instructions as to the nature of 
the task, the operation of his controls, the 
meaning of the information lights and the 
criterion of problem solution. 

Task.—During the experiment S was pre- 
sented with a list or series of geometric patterns. 
Each pattern was one of the possible stimulus 
combinations within the limits of the complexity 
of that problem. When each pattern appeared, 
S was required to press one of four response keys, 
placed directly in front of him, to identify the 
category to which that pattern belonged. Since 
the keys bore no markings, S had to determine 
the significance of each by trial and error. The 
four keys corresponded to the four possible 
combinations of the two stimulus dimensions 
which were relevant to the solution of the prob- 
lem, e.g., if size and form were the relevant 
dimensions, the four (keys) categories would be 
large square, large triangle, small square, and 
small triangle. A dimension is relevant if it is 
necessary for the correct classification of the 
patterns. An irrelevant dimension is defined 
as one which appears at each of its two levels in 
the stimuli of the problem but cannot con- 
sistently be used to correctly classify the pat- 
terns. If a dimension is neither relevant nor 
irrelevant, only one level of that dimension 
would appear in the series of stimulus patterns. 
For example, if the size dimension were neither 
relevant nor irrelevant in a particular problem, 
all patterns would be large or small figures but 
not both. 

A modified noncorrection procedure was used. 
After S made one response he was informed of 
the correct response for that pattern by the 
lighting of a 10-w. lamp directly above and about 
4 in. from the correct key for the given stimulus 
pattern. Immediately, upon S’s response, the 
screen upon which the stimulus pattern had been 
presented would become blank for the next 10 
sec, A delay interval, from 0 and 8.0 sec., was 
introduced between Ss response and the onset 
of the information light. The information light 
always remained on for 1 sec. 

The criterion of problem solution was 32 
consecutive correct identifications. The task 
was self-paced in that S was allowed as much 
time as he needed to make his response to any 
pattern. 

Lists.—A total of nine strip film series of 
patterns was used—three at each level of ir- 
relevant information. There were three basic 
problems, a problem being defined as the four 
combinations of two particular two-level 
relevant dimensions. A total of seven di- 
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mensions were used: color (red or green), form 
(triangle or square), number (one or two), size 
(large or small), orientation (upright or tilted), 
horizontal position (left or right side of screen), 
and vertical position (top or bottom of screen). 
The three problems selected were: (A) orienta- 
tion-form, (B) vertical position-size, and (C) 
color-number. 

Since each stimulus could take on one of the 
two levels of a given dimension, the amount of 
information in the stimuli could be quantified in 
bits of information by evaluating log: of the 
number of equally probable alternative stimuli 
in the problem being presented. If in Problem 
A, one dimension—say color—were irrelevant, 
all of the patterns presented might be single 
large upright figures appearing in the lower right 
quadrant of the screen with eight variations; 
the figures could be squares or triangles, upright 
or tilted, and red or green. There would thus 
be 3 bits of information in the stimuli, only 2 of 
which would be relevant to the solution of the 
problem. More irrelevant information could 
be added by introducing variations in any or all 
of the other dimensions. Irrelevant information 
was added in the present experiment by ran- 
domly selecting the necessary number of di- 
mensions from those which were not relevant. 
At each level of task complexity, all possible 
patterns were used. The order of possible 
patterns within a problem was determined by a 
semi-random procedure. Restrictions on ran- 
domness were that no pattern may follow itself 
in the series and that each appears equally often 
in every 128 patterns. 

Apparatus.—The stimulus patterns were 
presented visually to S, being projected onto a 
milk-glass screen (11.25 X 7.5 in.) mounted at 
about eye level. This screen was set in an 
opaque partition (34 X 33 in.). The patterns 
were presented with a Dunning Animatic strip 
film projector. When projected onto the screen, 
large patterns were 1.5 in. on a side, small ones 
.75 in. The S was seated about 2 ft. from the 
screen. 

Each of the four keys used by S was con- 
nected to a light in front of E. These lights 
signaled to E the response of S. Each key 
activated a timing unit through a relay. This 
timing unit was composed of three Allied Radio 
timing circuits, which were used to control the 
delay and duration of information and over-all 
trial length. A Western Union tape transmitter 
with a continuous loop of tape controlled S’s 
information lights in step with the sequence of 
patterns on the film. The S’s response then 
would advance the strip film projector to the 
next (blank) frame, signal the response to E, and 
activate the timing unit. The timing unit 
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would, after the delay, advance the Western 
Union tape transmitter, presenting the correct 
light to the S, and after 2 sec. activate the 
transmitter again, thus turning off the light, and, 
after a total of 10 sec. had elapsed, advance the 
strip film projector to the next pattern. 

A check was made on the variability within 
the timing circuits by calibrating each for a 10- 
sec. interval, the longest used in the present 
experiment, and recording the duration of each 
on 50 consecutive trials. ‘The variances for the 
three circuits were .00019, .00057, and .00021 
sec., respectively. There were no significant 
differences among the three circuits. 

Design.—A 6 X 3 factorial design was used 
with three levels of problem difficulty, 1, 3 and 
5 bits of irrelevant information, and six levels of 
delay, .0, .5, 1.0, 2.0, 4.0, and 8.0 sec. In 
addition, three different problems, different with 
respect to the two relevant dimensions, were used 
to offset the possibility of one S passing the 
solution on to others. This variable was kept 
orthogonal to the other two so that its effect on 
performance could be evaluated. 


RESULTS 


Initial equality in performance 
among the 18 groups was assumed 
since Ss were assigned randomly to 
the experimental conditions. No 
direct statistical tests could be made 
of this assumption and the correctness 
of all results and conclusions depends 
upon its validity. Three response 
measures were used: time to problem 
solution, trials to solution, and num- 
ber of errors. Figure 1 shows the 
mean amount of time, mean number 
of trials, and mean number of errors 
as functions of task complexity. 

Correlations.—-As a check on the 
interrelatedness and validity of the 
three measures used in the study, 
product-moment correlation coeffi- 
‘cients were computed. The three 
r’s were significant. ‘The correlation 
between time and trials was .936 
(t = 33.66, 160 df, P = .O1); between 
time and errors, .861 (t = 21.40, 160 
df, P = Ol); and between trials and 
errors, .934 (t = 33.00, 160 df, 
P = Ol). 
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Fic. 1. Mean time to solution, mean number 
of trials, and mean number of errors as a function 


of the amount of irrelevant information. Each 


plotted point represents the data from 54 Ss. 


Errors.—Since the instructions to 
S stressed accuracy rather than speed 
in solving the problems and because 
of the high intercorrelations among 
the three measures, only the analysis 
of the data based on mean number of 
errors to solution need be reported. 
The functions and the analysis of 
variance using time to solution and 
trials to solution indicated essentially 
identical results as the error measure. 

Figure 2 presents graphically the 
mean number of errors made as a 
function of log, delay of information 
feedback. An analysis of variance 
was performed on these data and a 
summary of this analysis is presented 
in Table 1. A _ Bartlett’s test for 
homogeneity of variance was also 
made on the residual error term. 
Because of the significant hetero- 
geneity indicated by this test (x? 
= 72.40, 8 df, P= Ol), only F 
ratios with probability beyond the .01 
level were accepted. Two main 
effects in the analysis were significant: 
amount of irrelevant information 
(F = 37.56, 2 and 108 df, P = Ol) 
and delay of information feedback 
(F = 9.99, 5 and 108 df, P = .O1). 
The significance of the irrelevant 
information source of variance is 
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Fic. 2, Mean number of errors as a function 
of logs delay of information feedback. Each 
plotted point represents the data of 9 Ss. 


consistent with the results of Archer, 
Bourne, and Brown (1) and Brown 
and Archer (2) and indicates that the 
number of errors made in solving this 
type of problem is an_ increasing 
function of the amount of irrelevant 
information. The mean numbers of 
errors were 14.46, 28.70, and 50.33 
for 1, 3 and 5 bits of irrelevant 
information, respectively. Figure 1 
shows that this increase is nearly 
straight line in form. Statistical veri- 
fication of this was provided by an 
orthogonal polynomial analysis ap- 
plied to this function in which only 
the linear term reached an acceptable 
significance level (F = 73.96, 1 and 
108 df, P = Ol). 

The number of errors was also 
significantly affected by the length of 
the delay interval. The mean num- 
ber of errors for the groups working 


under the six delays, .0 through 8.0 
sec., were 25.11, 25.92, 28.99, 31.88, 
39.18 and 63.88, respectively. Figure 
2 indicates that the increase is a 
positively accelerated function of logs 
delay; to test the significance of the 
trend in this curve, an orthogonal 
polynomial analysis was made. The 
data from the immediate information 
group (.0-sec. delay) were omitted 
from this analysis since the test 
requires equal intervals on the inde- 
pendent variable. The analysis of 
variance was: redone to obtain the 
correct error term for the orthogonal 
polynomial components. Since the 
same sources of variance as in the 
original analysis were found sig- 
nificant, no consideration need be 
given to this revised analysis. Both 
the linear (F = 32.10, 1 and 90 df, 
P= Ol) and the quadratic (F 
= 7.03, 1 and 90 df, P = .O1) poly- 
nomials were found significant. This 


TABLE 1 


Summary or ANALYsIS OF VARIANCE ON 
NumBer or Errors 


df MS 


Source of Variance 


Irrelevant infor- 
mation (1) 

Linear 
Quadratic 

Delayt (D) 
Linear 
Quadratic 
Cubic 
Quartic 

Problems (P) 

Ix D 

IxP 

Dx P 

IxDxP 

Residual} 


21798.9 
42920.5 
667.4 
5798.1 
20020.8 
4381.9 
835.6 
27.3 
1401.2 
786.3 
94.5 
335.6 
607.6 
500.3 


RO ee es ee ee ee BO 


_— 
— 
> 


Total 











*P = OL. 

t The orthogonal polynomial analysis of the Delay 
source was based upon the data of the .5-, 1.0-, 2.0-, 4.0., 
and 8.0-sec, groups only. 

* t ‘eae ant heterogeneity of variance beyond the .01 
vel, 
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is taken as evidence for the reliability 
of the positive acceleration of this 
function. 

As is consistent with previous 
results (11), the interaction of delay 
by irrelevant information was not 
significant. Inconsistent with previ- 
ous results (1, 2) is the fact that 
problems were not a significant source 
of variance. This may be due to the 
fortuitous combination of dimensions 
which define the problems in this 
experiment since the dimensions for 
each of the three problems were 
chosen randomly. However, as in 
the results of Brown and Archer (2), 
the most difficult problem in the 


present experiment, Problem B, con- 
tained a “positional” dimension, i.e., 
a dimension referring to the position 
of the figure on the screen and not 
associated with the stimulus pattern 
The finding suggests that 
there is something about the “posi- 


per se. 


tional”? dimensions which makes them 
less available to S for solving the 
problems. 


Discussion 


The results of this experiment support 
the hypothesis that task difficulty in 
concept identification can be quantita- 
tively controlled by varying the amount 
of irrelevant information in the stimuli. 
Performance was found to be an inverse 
linear function of the amount of irrele- 
vant information. In the experiments 
by Archer, Bourne, and Brown (1), 
performance was found to be a positively 
accelerated decreasing function of task 
difficulty. However, the findings of the 
present study do not contradict the 
previous results. One factor which 
might have accounted for the disparity 
in the results of the two sets of experi- 
ments is the difference in S’s response 
procedure. The present investigation 
employed a modified noncorrection pro- 
cedure in which S§ was allowed only one 
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response to each stimulus pattern; in the 
previous studies a correction procedure 
had been employed in which § continued 
making responses until he found the 
correct one. A comparison of the data 
of the two experiments indicates that the 
noncorrection procedure increases the 
difficulty of the task. In addition, 
although the stimulus dimensions were 
roughly the same in the two studies, 
there were differences in the mode of 
presentation of these stimuli to §. In 
the experiment by Archer, Bourne, and 
Brown (1), patterns appeared on the face 
of an oscilloscope; whereas, in the present 
experiment, stimuli were projected onto a 
screen by means of a strip film projector. 
In view of these differences, it is not 
surprising to find that the performance- 
complexity functions are not identical in 
the two studies. 

Delay of information feedback was a 
consistently significant source of variance 
in the analyses. Performance degraded 
as a positively accelerated function of the 
length of the delay interval. 
is consistent with previous data 
from experiments investigating the 
effects of the delay variable. Grice (6), 
Greenspoon and Foreman (5), and 
Spence (13) in essence propose that the 
stimulation associated with making the 
response persists for a period of time. 
However, these stimuli decay as a func 
tion of time and the length of delay in 
reinforcement over which learning can 
occur depends upon the rate of decay of 
this stimulus complex. On the other 
hand it might be suggested that the 
stimulus pattern presented to S on any 
trial fades as a function of time and that 
forgetting is the principle reason why 
reduced performance from de- 
laying reinforcement. If either or both 
of these propositions is indeed correct, 
several hypotheses should hold true in 
future experimentation. It would be 
expected that requiring § to maintain 
his key-pressing activity until the in 
formation feedback was presented would 
increase his rate of learning if decay of 
the response trace is the most important 
variable in the delay effect. On the 


This result 
most 


results 
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other hand, if forgetting the presented 
stimulus pattern is the effective result, 
a situation in which the pattern remains 
on the screen, instead of the screen 
becoming blank when S responds, should 
increase the rate of learning or goodness 
of performance. Another method of 
investigating this latter hypothesis would 
be to present S with stimuli, of varying 
degrees of similarity to the patterns in 
the problem, instead of a blank screen 
during the delay interval. These stimuli 
would probably interfere with the 
stimulus trace of the original pattern to 
which S responded and thus increase the 
effectiveness of delay as an inhibitor of 
performance. 

It should be noted that there exists no 
significant interaction between delay and 
task complexity. Although none of the 
experimental evidence that exists at 
present has found the interaction a sig- 
nificant factor, some, in particular the 
data of Saltzman (12), tend to indicate 
an effect. Figure 2 shows a tendency 
for the three curves corresponding to the 
three levels of irrelevant information to 
diverge at the longer delay intervals. 


It is quite plausible to argue that, had 


several longer delay intervals been 
introduced into the experimental design, 
a significant interaction might have been 
found. The performance-delay function 
for the condition involving 1-bit irrele- 
vant information, upon extrapolation, 
suggests that performance will continue 
at a high level with increase in delay 
whereas the function for the condition 
involving 5-bit irrelevant indicates that, 
with a further increase in delay, problem 
solution might never be reached. 


SUMMARY 


Concept identification was studied as a 
function of both delay of information about the 
correctness of a response and the degree of task 
complexity as measured in terms of the amount 
of irrelevant information in the stimuli. The S’s 
task was to classify geometric patterns into four 
categories. Each of the 162 Ss served individu- 
ally in one of 18 groups. Nine served in each 
group, three learning each of the three problems 
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within a group. The experimental design was a 
6 X 3 factorial with .0-, .5-, 1.0-, 2.0-, 4.0-, and 
8.0-sec. delays in information, and 1, 3, and 5 
bits of irrelevant information. All Ss were 
given standard tape-recorded instructions and 
served to a criterion performance of 32 con- 
secutively correct identifications. The delay 
and trial-time intervals and the presentation of 
the stimulus patterns were automatically con- 
trolled. 

The essential results were: (a) The main 
effect of delay of information feedback was 
a consistently significant source of variance. 
Performance decreased at a positively accele- 
rated rate as logs delay increased. (b) Varying 
the amount of irrelevant information was an 
effective method of manipulating task difficulty. 
Performance decreased linearly with an increase 
in irrelevant information. (¢c) The interaction 
of delay with complexity was not found 
significant. 
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SOME SOURCES OF ERROR IN HALF-HEAVINESS 
JUDGMENTS 


TRYGG ENGEN AND OLKER TULUNAY 


Brown University 


Recent experiments, notably those 
of Garner (2, 3, 4), have raised doubts 
about the suitability of fractionation 
judgments for developing ratio scales. 
In one experiment Garner’s (4) un- 
practiced Os- made _half-loudness 
judgments which seemed almost com- 
pletely dependent upon stimulus con- 
text. Each of three randomly as- 
signed groups was presented the same 
standard (90 db) but a different and 
nonoverlapping range of comparison 
tones (55-65, 65-75, 75-85 db). The 
highly reliable half-loudness values se- 
lected by each group were at the mid- 
point of the series of comparison tones. 
Garner concluded that this finding 
demonstrated that half-loudness judg- 
ments are invalid, but that the high 
reliability of the judgments suggested 
that even though O was unsure of 
what sounds half as loud as the stand- 
ard, he decided on the basis of the 
choices provided by the first com- 
parison tones what he was going to 
judge as half and maintained con- 
sistency in this judgment. 

Stevens and Poulton (8), however, 
have questioned the generality of 
Garner’s conclusion, arguing that 
Garner’s method of constant stimuli 
“constrained” or restricted O unduly. 
These investigators attempted to 
minimize constraint and thus the 
effect of context by giving an un- 
practiced O only one comparison tone 
which he adjusted with a knob cali- 
brated in terms of sones. Their 
results were concordant with those 
of previous experiments (see also 
Stevens [7}). 


The findings of these two experi- 


ments suggest that with respect to 
the validity of fractionation judg- 
ments, differences in results may be 
based on: (a) the method with which 
the judgments were obtained; (b) the 
sophistication of O, e.g., Garner’s 
invalid constant stimuli judgments 
(4) were obtained from unpracticed 
Os; and (c) the values of the first 
comparison stimuli presented to 0. 
In the present study experiments 
were designed to test the effects of 
these variables on _ half-heaviness 
judgments. 


Metuop 


Observers.—-Ninety-two Os (61 males, 31 
females) were used in three experiments. 
Seventy-two elementary psychology students 
who had no previous experience in psychophysics 
served as naive Os, and 20 psychology staff 
and graduate students were the 
relatively more sophisticated Os.! 

Experiment I,—The objectives of this experi- 
ment were to repeat Garner’s experiment with 
weight judgments and to determine whether or 
not more sophisticated Os are also subject to 
context effects. ‘The Os received the following 
instructions: 

“The purpose of this experiment is simply to 
find out how different two weights must be for 
one of them to feel half as heavy as the other. 
We are not concerned with the physical differ- 
ence between weights, but only with what feels 
half as heavy to you. 

“You will lift a series of pairs of weights by 
hefting them from the elbow with the first three 
fingers of your preferred hand—like this. The 
first weight will always be the same. ‘The second 
will vary in weight from one time to another. 
Your task is to decide whether the second weight 


members 


1 “Sophisticated” is used merely as a con- 
venient label to indicate that these Os were 
psychologists. They were not specially ex- 
perienced as Os or Es in psychophysics and 
knew no more about the experiment than the 
naive Os. 
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feels more than half as heavy or less than half 
as heavy as the first weight. Respond by saying 
‘more than half’ or ‘less than half.’ You may 
lift each pair twice. 

“Of course, all the weights may feel more than 
half as heavy or all of them may feel less than 
half as heavy; this is what we are trying to 
determine, so make as careful judgments as you 
can. Any questions?” 

Ten naive Os were assigned randomly to light 
comparison weights (78, 100, 110, 120, 139, and 
150 gm.) and 10 were assigned to heavy com- 
parison weights (150, 165, 175, 185, 200, and 
210 gm.). ‘Twenty sophisticated Os were simi- 
larly assigned to light vs. heavy comparison 
weights. The standard weighed 300 gm. and 
was the same for all four groups. Only the 
heavy set included weights judged half as heavy 
as this standard in all previous investigations 
(1, 5, 6). The weights, selected from Guilford 
and Dingman’s nearly geometric series (5), were 
prepared from identical, black, hard-rubber, 
4 X 6 cm. cylinders filled with shot and cotton. 
The set of comparison weights was presented to 
O five times, each time in a different random 
order. 

Experiment I1.—This experiment was de- 
signed to test the influence of the first comparison 
weights on the subsequent half-heaviness judg- 
ments. The standard was again 300 gm. but 
the comparison weights were 120, 139, 150, 165, 
175, and 185 gm. for all Os. ‘Ten Os, randomly 
selected, received practice with heavy compari- 
son weights—185, 200, and 210 gm.—and 10 Os 
received practice with light comparison weights 

78,100,and 110mg. Each of the three practice 
weights was presented with the standard twice 
for a total of six practice trials. Except for the 
addition of practice, the procedure was the same 
as in Exp. I. 

Experiment I11.—This experiment employed 
the method of adjustment in order to obtain 
judgments that could be compared with those 
of the above constant stimuli experiments as 
well as previous experiments. Eight Os were 
randomly assigned to each of four groups. The 
groups differed only in the standard weight used ; 
the four standards weighed 150, 300, 550, and 
900 gm. The same cylinders as for the above 
experiments were used for the 150 and 300 gm. 
standards, but for the two other standards it 
was necessary to use larger (5 K 10.5 cm.) 
cylinders. Identical padded with 
cotton to prevent used for the 
variable weights. 

Each O gave eight half-heaviness judgments. 
Four ascending and four descending trials were 
counterbalanced for series effects. Although 
varying from trial-to-trial, the weight of the 
comparison weight was always well below the 


cylinders, 


noise, were 
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subjective half for ascending trials and well 
above for descending trials in terms of values 
obtained in previous research (5). The O, of 
course, was never informed about the physical 
weights of the stimuli. 

The O and E were seated at opposite sides of 
a table with screens placed between them in 
such a way that O could not sce the weights he 
was lifting. The included the 
statement of the purpose of the experiment as 
in the instructions for Exp. I. The 
procedure is described by the remaining 
instructions: 


instructions 


above 


“You will be given a container which you 
will lift with your preferred hand—like this 
You will first be given the standard weight. 
Then you will be given another container which 
will feel either more or less than half as heavy 
as the standard. What I want you to do is to 
tell me to add more weight or subtract some from 
this weight until it feels just half as heavy as the 
You subtract or add to the 
comparison weight as often as you wish before 
you make your final half-heaviness judgment. 
We will repeat the procedure several times 
Any questions?” 


standard may 


The E used a measuring spoon to add and 
subtract from the The amount 
added at one time varied but it was never more 
than 10% or less than 1% of the standard 
weight. 


standard. 


RESULTS 


The weight value 


Experiment I. 
judged by O to be half as heavy as 
the standard was estimated by inter- 
polation on the psychophysical func- 
tion from the judgments “more than 


The medians and 
semi-interquartile ranges of these 
individual half-heaviness values 
presented in Table 1. Half-heaviness 


half as heavy.” 


are 


TABLE 1 


Mepian Harr-Heaviness Vatues ror Naive 
AND Sorursticatep Os anv ror Heavy 
AND Licut Comparison Weicurs 
(Standard = 300 gm.) 


| Naive Os | wr ated 
Comparison } 


Weights 


Median Q Median 


78-150 gm. 
150-210 gm. 
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~~ Sepeieticeted Os 
“— wee Of 





Comperisee = Weights (ee) 


Fic. 1. Cumulative frequencies of judgments 
of “more than half” given by naive and sophisti- 
cated Os for light and heavy weights compared 
with a 300-gm. standard. 


values could not be determined for 
two sophisticated Os because one of 
them rejected all the light and the 
other all the heavy comparison 
weights as less than half as heavy as 
the standard. Moreover, since the 
sophisticated Os in the light com- 
parison group predominantly rejected 
these comparison weights as less than 
half as heavy, the median reported 
for this group cannot be considered 
as reliable as the medians for the other 
three groups. 

Figure 1 presents cumulative fre- 
quencies of judgments being more 
than half as heavy as the 300-gm. 
standard and indicates the extent to 
which sophisticated and naive Os 
differed in their judgments of light 


Oo prectice- heevy worgnis 
oo prectice- light weights 





100 ae 
Comperiseon Weights (qm) 


Fic. 2. Median proportions of judgments 
of “more than half” for each of six successive 
weights compared with a 300-gm. standard. 


and heavy comparison weights. Dif- 
ferences between any two groups 
were evaluated by the Mann-Whitney 
test on the sum of ranks of each O's 
total number of judgments being 
more than half as heavy. The dif- 
ference between sophisticated Os in 
the heavy and light groups and the 
difference between naive and sophisti- 
cated Osin thelight group are significant 
with P values of .01 and .05, respec- 
tively. The difference between naive 
Os in heavy vs. light groups is not 
significant. 

Experiment II.—The median pro- 
portions of “more than half’ judg- 
ments for each of six comparison 
weights obtained by light and heavy 
practice groups are presented in Fig. 
2. An analysis of each O's judgments 
separately indicated that the results 


TABLE 2 


Mean anv Mepian Hatr-Heaviness Vatues ror Eacn STanparp For AscEeNDING, 
Descenpinc, anp ALL JupDGMENTS 


Ascending 
Standard 
(Gm.) 


Mean | Mdn. SD 
82.7 
140.0 
306.5 
501.7 


94 
22.3 
48.4 
34.6 


81.8 
133.9 
303.4 
503.1 








Descending 


Guilford 
& 


Dingman 


539.0 
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are in the expected direction only for 
the first five or six judgments im- 
mediately following practice, but the 
Mann-Whitney test provides no sta- 
tistical evidence that there is an over- 
all difference between the groups. 

Experiment II1.—The means, medi- 
ans, and SD’s for the ascending, de- 
scending, and combined half-heaviness 
judgments obtained with the method 
of adjustment are presented in Table 2 
along with the results of Guilford and 
Dingman’s (5) experiment. Individ- 
ual coefficients of variation ranged 
from about 5 to 25 for each of the 
four groups. 

Thirty-one Os had higher mean 
judgments for descending trials than 
for ascending trials which is a highly 
significant difference according to the 
sign test (P <.01). Moreover, the 
differences between medians are con- 
sistently greater than the differences 
between means for ascending and 
descending trials. This suggests that 


the judgments tend to be positively 


skewed for ascending trials and nega- 
tively skewed for descending trials. 


Discussion 


Experiments I and II, which utilized 
the method of constant stimuli, sub- 
stantiate Garner’s (4) finding that frac- 
tionation judgments are largely deter- 
mined by context. Regardless of the 
absolute value of the comparison weight, 
O had a tendency to judge it to be more 
than half as heavy if it was one of the 
heaviest in the series and to be less than 
half as heavy if it was one of the lightest 
in the series. It is evident, nevertheless, 
that sophisticated Os were less influenced 
by context than naive Os. 

The present data do not seem to 
support Garner’s (4) hypothesis that O 
decides on the basis of the choices pro- 
vided by the first comparison stimuli 
what he will judge to be subjectively 
half of the standard. The judgments 
made by Os who had preliminary practice 
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with heavy comparison weights did not 
differ significantly from the judgments 
of Os who had preliminary practice with 
light comparison weights. Both groups 
selected half-heaviness values close to 
the midpoint of the range of comparison 
weights used in the experiment proper. 

The average half-heaviness values 
obtained by the method of adjustment in 
Exp. III were also influenced by context. 
In the relatively light stimulus context 
of ascending trials O judged a lighter 
weight as being half as heavy as the 
standard than in the heavier context of 
descending trials. Of course, another 
way of stating it is that series effects 
show up markedly, but may be balanced 
out by combining ascending and de- 
scending series. 

In this connection, it is important to 
note that the combined half-heaviness 
judgments are in very good agreement 
with those obtained by Guilford and 
Dingman (5) in an experiment in which 
Os (graduate students in psychology) 
selected a half-heaviness value from a 
tenfold range of 17 or 21 comparison 
weights. These fractionation judg- 
ments, in turn, compared well with 
constant-sum judgments. These agree- 
ments support Stevens and Poulton's (8) 
argument in favor of such psychophysical 
methods, all of which leave O free of 
constraint imposed by restricted stimulus 
or response categories. 

In conclusion, it is suggested that 
errors in ratio judgments may be related 
to both the type of observer and the 
psychophysical method employed. The 
use of unpracticed and naive Os as 
instruments of precision, e.g., in meas- 
uring sensory magnitudes on a ratio 
scale, may not be unlike the use of 
uncalibrated physical instruments. 
With such instruments, constant errors, 
such as those associated with context, 
may remain It may, how- 
ever, “calibrate” the 
human O by giving him experience with 
the various types of psychophysical 
judgments and their sources of bias. A 
practiced and sophisticated O might 
well be a reliable and valid instrument. 


unknown, 
be feasible to 
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The difference between the completely 
unpracticed, naive Os and the relatively 
more practiced, sophisticated Os in the 
present study appears to be consistent 
with this viewpoint. 


SUMMARY 


This study evaluated the validity of half- 
heaviness judgments obtained by the method of 
constant stimuli and the method of adjustment. 

The results were: (a) The half-heaviness 
values Os selected in the constant stimuli ex- 
periments depended largely on the context 
provided by the range of comparison weights. 
(b) Relatively sophisticated Os were less in- 
fluenced by context than naive Os. (c) Pre- 
liminary practice with relatively heavy or 
relatively light comparison weights had a small 
and statistically insignificant effect on the 
subsequent judgments. (d) The method of 
adjustment, as compared with the method of 
constant stimuli, yielded results more in line 
with those obtained in previous experiments. 
However, the judgments obtained with the 
method of adjustment were also influenced by 
context since ascending and descending trials 
did not yield similar results. 

Errors in psychophysical judgments are 
discussed in relation to the sophistication and 
practice of the observer. 


TRYGG ENGEN AND OLKER TULUNAY 
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RETENTION AND MEANINGFULNESS OF MATERIAL 
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University of Pittsburgh 


In their surveys of verbal learning, 
McGeoch and Irion (3) and Osgood 
(6) have pointed out that no syste- 
matic study had been conducted to 
determine the relationship between 
retention and the meaningfulness of 
verbal material. The general belief 
is that the more meaningful the 
material the better the retention (3), 
but it has also been argued that 
nonsense material should be _ less 
susceptible than highly meaningful 
material to interpolated interference 
because the former is infrequently 
encountered during the period be- 
tween learning and the test for re- 
tention (6). More recently, how- 
ever, several studies have appeared 
containing information which bears 
on this relationship and which are 
thus important to theories of for- 
getting. Archer (1) measured the 
retention of high and low meaningful 
nonsense syllables after intervals of 
4 sec., 2 min., 5 min., and 10 min. 
As measured by the method of aided 
recall, retention “decreased as the 
rest interval increased and was un- 
related to meaningfulness” (1, p. 251). 

Underwood and Richardson (7) 
investigated the relationship between 
meaningfulness (defined also in terms 
of the association value of nonsense 
syllables), intralist similarity, serial 
position, and recall and relearning 
after 24 hr. Pertinent to the present 
study is their conclusion that for the 
materials they used meaningfulness 
does not influence recall but does 
significantly influence relearning. 

The purpose of the present study 
was to furnish additional information 
on the meaningfulness-retention rela- 


tionship by measuring the retention 
of high, moderate, and low meaningful 
material after 1 and 7 days by the 
methods of aided recall, unaided 
recall, relearning, and reconstruction. 
Luh (2) has shown that the extent of 
retention varies with the method of 
measurement. 


Metuop 


Meaningfulness of material. 
words of high, medium, and low 
meaning were selected from Noble’s m-scale 
(4). The mean m-values of the lists were 7.85, 
4.42, and 1.28 which were identical with the 
mean m-values of the lists used by Noble in his 
investigation of the role of stimulus meaning in 
serial verbal learning (5). ‘The construction of 
the lists followed Noble’s rule. 

Original learning (OL).—OL conformed to 
the following typical procedures: 12-item lists; 
2-sec. interitem intervals; intertrial 
intervals; the pronunciation-anticipation method 
with correction of errors; a criterion of 2 suc- 
cessive errorless trials. Material was presented 
on a Hull-type memory drum. 

Retention measures.—Retention was tested | 
and 7 days after learning. Since retention has 
been found to be affected by the method of 
measurement (2), the following four typical 
methods were used in this study: (a) Unaided 
recall—written reproduction. ‘The S was handed 
a blank sheet of paper and asked to write as 
many of the learned words as could be re 
membered in any order. was the 
number of words correctly reproduced. This 
was converted to a score of the 
possible number. Minor spelling errors were 
not penalized. (b) Reconstruction. Stimulus 
words were printed on 3 X 5-in. cards which 
were handed in 


Lists of 12 
stimulus 


6-8e¢ 


His score 


percentage 


order to S with in- 
structions to reconstruct the order in which the 
words had been learned. 


random 


The reconstruction 
was scored both as to position and sequence, 
with correction for chance factors, after the 
manner employed by Luh (2, pp. 19-21). (0) 
Relearning. The Ss relearned the stimulus list 
to the same criterion used in original learning. 
The following formula was used to obtain a 
percentage score: (OL trials — 2) — (RL trials 
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— 2)/(OL trials — 2). (d) Aided recall—an- 
ticipation. ‘The first trial of relearning provided 
the data for this method. ‘The S’s score was the 
number of words correctly anticipated on this 
trial, after conversion to a percentage score. 
Different groups of Ss were given the various 
retention tests except for the relearning and 
aided recall methods. A preliminary test of 9 
Ss with the recognition method showed that it 
was not a discriminative technique. 

Subjects.—The Ss were 90 undergraduate 
students at the University of Pittsburgh. There 
were 57 women and 33 men. Ages ranged from 
17 to 46 with a mean age of 20.37 yr. All were 
volunteers and were not informed of the purpose 
of the study. 

This study utilized a 3 X24 factorial 
design in which 5 Ss were randomly assigned to 
each cell. It should be kept in mind that the 
same Ss contributed the data for the relearning 
and aided recall scores. 


RESULTS 


Learning as a function of meaning- 
fulness (m).—Mean trials to attain 
the criterion of two successive error- 
less trials were 25.10 for the high-m 
list, 33.20 for the medium-m list, and 
47.87 for the low-m list. The SD’s of 
the distributions were 11.54, 11.61, 
and 17.65, respectively. 

A single classification analysis of 
variance was performed to test the 
significance of the differences among 
these means. The between-groups 
variance was significant at the .O1 
level (F = 20.68, df = 2, 87). Bart- 


lett’s test for homogeneity of variance 
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yielded a chi square of 7.24 which was 
significant at the .05 level. A com- 
parison of the three variances yielded 
an insignificant F for the high-m and 
medium-m comparison (F = 1.01) 
and F’s which were significant at the 
.O5 level for the high-m and low-m 
comparison (F = 2.34) and _ the 
medium-m and low-m comparison 
(F = 2.31). 

When the difference between mean 
trials to reach the criterion of learning 
on the high-m and medium-m lists 
were compared, a t of 2.99 was ob- 
tained (P < 01, df = 58). Com- 
parisons of mean trials to criterion 
for the high-m and low-m lists, and 
for the medium-m and low-m lists 
yielded #’s of 5.91 and 3.80, respec- 
tively (P < .O1 in each case, df = 29 
since the variances of these distri- 
butions were heterogeneous). 

Thus, as meaningfulness of material 
decreased, both trials to learn as well 
as variability increased. This finding 
is in accord with that of Noble (5) 
and other investigators who used 
different definitions of meaningfulness. 

Retention.—Since the only experi- 
mental control over Ss’ learning 
ability was that of random assignment 
to the various experimental condi- 
tions, further statistical control of this 
variable was imposed. Both raw 


TABLE 1 


Anatysis or Variance or Copep Retention Scores 


Source 


Meaningfulness (M) 
Retention interval (R) 
MXR 

Within groups 








171.99 


*P = 05. 
+P «Ol. 


Unaided Recall 


ms | 
1174.32 


783.57 
236.18 | 1.37 


Recon- 
struction 


MS F 
5.84°* 


12,87°* 
0.71 


852.19 
1879.89 
103.07 
146.03 


9.12 
| 26.53 


9.12 
0.35 | 
76.65 
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TABLE 2 


Anatysis or Variance or Raw Retention Scores 





= 
Reconstruction 


Relearning Aided Recall 








11.23 
48.13 
8.63 
4.57 


Meaningfulness (M) 
Retention interval (R) 
MXR 

Within groups 











*P = 0S. 
—-P = Ol, 


learning and retention measures were 
converted to 2z-scores, multiplied by 
10, and algebraically added to the 
constant 40. Each S’s learning 
standard score was then subtracted 
from his retention standard score and 
50 was added to the difference to 
eliminate negative values. These 
coded difference scores were sub- 
jected to a 3 X 2 analysis of variance 
for each retention method. The re- 
sults of these analyses are presented 
in Table 1. 

It can be seen that the main effects 
of meaningfulness of material and 
retention interval are significant at 
customarily accepted levels of con- 
fidence only when retention is meas- 
ured by the methods of unaided recall 
and aided recall. Neither of these 
variables influences retention when 
measured by the relearning or recon- 
struction methods. The interaction 
of these two variables is nonsignificant 


2.46 
10.53** 
1.89 


ms | F | ms | FP 
0.31) 0.10 59.70 | 4.85* 4.90 
38.53 | 11.82** | 80.03 | 6.51* | 43.20 
0.76 | 0.23 29.03 | 2.36" | 0.30 

12.30 6.05 


0.81 
7.14* 
0.05 








when retention is measured by each 
of the four methods. 

When control over learning ability 
is not exercised, and when the raw 
retention measures were subjected 
to analyses of variance, the results 
in Table 2 were obtained. The 
effect of retention interval is signi- 
ficant in the case of each measurement 
method but that of meaningfulness is 
significant only in the case of re- 
learning. Only with this method is 
the Retention Interval and Meaning- 
fulness interaction significant. 

Table 3 contains the mean per- 
centage retention scores yielded by 
the four methods of measurement on 
the three lists. Comparable values 
of the coded retention scores appear 
in Table 4. 

In those cases where meaningful- 
ness of material is a significant source 
of variation (aided and unaided 
recall), the amount retained is directly 


TABLE 3 


Percentace Retention Scores or THe Lists 


Meaningfulness 


1 Day 


95.00 
100,00 
97.50 


1 Day | 7 Days 
100.00 
93.33 
85.00 


High m 
Medium m 
Low m 


83.33 
55.00 
76.67 


pi | 
Unaided Recall | Reconstruction 
| 


Relearning Aided Recall 


1 Day | 7 Days 1 Day 7 Days 
82.53 | 
75.91 
79.42 


90.71 | 
81.07 
85.46 


75.00 | 
68.33 | 
66.67 | 


58.33 
46.67 
45.00 
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TABLE 4 


Copep Dirrerence Ketrention Scores or THe Lists 


Unaided Recall 


Meaningfulness 


1 Day 7 Days 
58.78 
38.24 
36.29 


62.64 
59.55 
41.78 


High m 
Medium m 


Low m 49.65 








related to the meaningfulness of the 
material. A similar trend exists in 
the retention data obtained by re- 
learning and reconstruction, although 
the effect of meaningfulness is not 
statistically significant. Where the 
retention interval is a_ significant 
effect, retention scores obtained 7 
days after learning are lower than 
those obtained | day after learning. 


Discussion 


This study has shown that when 
measured by the relearning and recon- 
struction methods the effect of meaning- 
fulness on retention 1 and 7 days after 
learning was nonsignificant. However, 
at these same intervals level of retention 
decreased significantly as meaningfulness 
of material decreased when the methods 
of aided and unaided recall were used. 
While not directly comparable, these 
findings are not consistent with those of 
Archer (1) who showed aided recall up 
to 10 min. after learning to be unrelated 
to meaningfulness of nonsense syllables, 
or with those of Underwood and Richard- 
son (7) who found that at 24 hr. meaning- 
fulness did not significantly influence 
recall, However, Underwood and 
Richardson (7) have demonstrated that 
the rate of relearning at 24 hr. was 
directly related to meaningfulness, and 
the results of this study show a similar 
relationship at 1 and 7 days when raw 
relearning scores are considered. 

The present experiment has also shown 
that in general the more meaningful the 
material, the greater the retention at 
both 1 and 7 days after learning. 


Reconstruction 


Relearning Aided Recall 


7 Days 1 Day 
49.43 51.16 
52.77 50.34 
45.22 49.21 


7 Days 7 Days 








Finally, this study bears upon Luh’s 
(2) investigation of the form of the re- 
tention curves as a function of the 
method of measurement. Luh did not 
vary meaningfulness, and his retention 
intervals were 20 min., 1 hr., 4 hr., 1 day, 
and 2 days. In half of his 90 lists of 12 
syllables each, the measures of written 
reproduction, recognition, and _ recon- 
struction were applied in sequence to the 
same Ss. The percentage scores at 1 
day in the present study are much higher 
for all classes of material than those 
obtained by Luh at 1 day. For Luh 
these were: 39.2 (unaided recall), 50.9 
(reconstruction), 52.1 (relearning), and 
17.8 (aided recall-anticipation). Simi- 
larly, in this study the percentage re- 
tention scores at 7 days were higher than 
those obtained by Luh at 2 days. Luh’s 
scores at 2 days were: 10.0 (anticipation), 
47.7 (relearning), 26.7 (unaided recall), 
and 38.6 (reconstruction). It can also 
be seen in Table 3 that the rank order 
of the methods in the amount of re- 
tention they yield differs from that of 
Luh at 1 day. However, in both studies, 
the obtained retention scores appear to 
be negatively accelerated. 


SUMMARY 


This study investigated the relationship 
between meaningfulness of material and re- 
tention 1 and 7 days after learning. High, 
moderate, and low meaningful material was 
selected from Noble’s lists and learned by the 
serial anticipation method. Retention was 
measured by the methods of aided recall, unaided 
recall, relearning, and reconstruction. Rate of 
original learning was found to be directly related 
to meaningfulness. When retention was meas- 
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ured by the methods of aided recall and unaided 
recall, the main effects of meaningfulness and 
retention interval were significant beyond the 
O1 level. Neither of these variables was 
significant when retention was measured by the 
veconstruction and relearning methods. 
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LEARNING WITHOUT AWARENESS AND EXTINCTION 
FOLLOWING AWARENESS AS A FUNCTION 
OF REINFORCEMENT 
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Thorndike considered the demon- 
stration of human learning without 
awareness (hereafter called LWA) to 
be an important step in the verifi- 
cation of his theory of effect, viz., 
that the effects of rewards on stimulus- 
response connections can be _ inde- 
pendent of S’s understanding. In 
support of this hypothesis Thorndike 
and Rock (10) attempted to show that 
human Ss can learn to give a class of 
verbal responses in a free-association 
experiment without being aware of 
the basis of their responses. Since 
the learning curve for their Ss rose 
gradually rather than suddenly 
Thorndike and Rock concluded that 
the Ss could not have been aware of 
the principle. 

Irwin, Kaufman, Prior, and Weaver 
(6) showed, however, that the shape 
of the learning curve by itself is not 
an adequate index of the presence or 
absence of awareness. Even when 
Ss were explicitly taught the correct 
principle, the acquisition curve still 
showed a gradual rise; i.e., successful 
application of the principle required 
practice. 

In an attempt to obtain an inde- 
pendent criterion of awareness, Post- 
man and Jarrett (9) repeated the 


' This paper was prepared during the author’s 
tenure as a National Science Foundation post- 
doctoral fellow. It is adapted from part of a 
Ph.D. dissertation submitted to the University 
of California, Berkeley (5). Some of the 
material was presented in a paper given at the 
1955 meeting of the American Psychological 
Association. Acknowledgment is due to 
Professor Leo Postman for many helpful 
suggestions and criticisms. 


experiment of Thorndike and Rock 
but asked Ss at the end of each block 
of 20 trials to state the principle 
according to which they were making 
their responses. They found a small 
but significant amount of LWA, but 
only for those Ss who eventually 
succeeded in stating the principle. 
Subsequently, Philbrick and Postman 
succeeded in obtaining a significant 
amount of improvement both in the 
group that achieved the principle and 
in the one that did not. The latter 
investigation made use of a principle 
that was “both simpler and easier to 
apply ...” (8, p. 424). It appears, 
therefore, that Ss can learn in a 
Thorndikian situation while they re- 
main unable to state the reasons for 
their responses. Recently LWA has 
also been demonstrated by the use of 
an operant procedure (1, 3, 11). 
Using the same experimental tech- 
nique as Philbrick and Postman, the 
present study addresses itself to three 
questions: (a) Can the evidence for 
LWA be attributed to the operation 
of partially correct hypotheses about 
the principle. of correct response? 
(b) What are the conditions of acqui- 
sition controlling the ability to ver- 
balize the principle? (c) Will perform- 
ance during extinction of Ss who 
become aware of the principle of 
correct response nevertheless vary as 
a function of the training conditions? 


MetHuop 


Task.—The procedure used by Philbrick and 
Postman was duplicated with such modifications 
as were required by the specific purposes o% this 
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investigation. The S was presented with a 
series of stimulus words varying in length from 
2 to 10 letters and was required to respond to 
each word with a number from one to nine. 
The response was called “Right” if the number 
was equal to the number of letters in the stimulus 
word minus one, e.g., test—3, telephone—, etc. 
All other responses were called “Wrong.” In 
the Philbrick-Postman study the conditions of 
reinforcement were not varied, all correct 
responses were rewarded and all incorrect re- 
sponses were punished. In the present study 
both the type and the schedule of reinforcement 
were varied systematically. 

Experimental design.—There were seven 
groups of 18 Ss each: six experimental groups 
and one control group. The six experimental 
groups formed a 2 X 3 factorial design. Two 
schedules of reinforcement were combined with 
three types of reinforcement. ‘The two schedules 
of reinforcement were 100% and 67%. ‘The 
three types of reinforcement were (a) reward 
(announcement of “Right”) for correct re- 
sponses, (b) punishment (announcement of 
“Wrong’’) for incorrect responses and (c) reward 
plus punishment. Hereafter the three types of 
reinforcement will be referred to as R, P, and 
R +P, respectively. The control group re- 
ceived no reinforcement. 

Materials and procedure.—There were 324 
different stimulus words, arranged in 36 blocks 
of 9 words each, with one word of each length 
placed at random in every block. The order of 
the blocks was systematically rotated from S to 
S. The acquisition period consisted of a maxi- 
mum of 24 blocks of trials. The acquisition 
period was terminated when S reached a criterion 
of two successive errorless blocks or at the end of 
block 24, whichever came first. A constant 
number of 12 extinction blocks then followed for 
all Ss during which no reinforcement was 
administered. 

At the end of the first training block on which 
S made four or more correct responses out of 
nine, a criterion which is 3.13 standard errors 
beyond a priori chance, he was asked to state the 
principle on which his responses were based.? 

Due to systematic response tendencies, e.g., 
guessing habits, the empirical chance level of 

2The point at which S is first questioned 
undoubtedly influences the time required to 
achieve verbalization, the achievement of 
verbalization, and the amount of LWA shown. 
This parameter merits more careful examination 
than has been possible in the present study. 
These inquiries were then continued after every 
subsequent block on which this criterion was 
reached until S had either stated the correct 
principle or reached the end of training. 
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responding is difficult to estimate. ‘The criterion 
of four or more correct out of nine seemed to 
provide a reasonable basis for determining when 
to question S. Since this criterion was reached 
at least once by 12 of the 18 control Ss, however, 
the present study makes it quite clear that it 
falls well within the limits of empirical chance 
and thus is not by itself sufficient evidence for 
LWA. 

The experimental Ss who never reached the 
criterion and all the control Ss were questioned 
after the experiment to ascertain whether they 
had the correct hypothesis or partially correct 
hypotheses at any time during the experiment. 

Instructions.—The following instructions were 
read to each S; “I have prepared a list of pairs 
of words and numbers. I am interested in 
seeing how many of the paired numbers you can 
guess correctly. I will show you the words and 
you guess the numbers that go with them. I 
have written the numbers on the paper on which 
I shall record your guesses. I have used num- 
bers from one through nine for each word, so 
you can choose from one through nine for each 
word, With this large selection of numbers you 
will probably not make many correct guesses. 
It is a long list and I will go through it only once. 
Try to give your responses as quickly as you 
can.” The experimental groups were told (the 
expression appropriate to the type of reinforce 
ment administered was used): “On some trials 
I will tell you that you are right (wrong, either 
right or wrong). However, if I say nothing, it 
does not necessarily mean that you are wrong 
(right, either right or wrong).” The control 
group was told: “Nothing will be said about 
your accuracy while we run the experiment.” 

These instructions were intended to convey a 
partial reinforcement “set” to all Ss (the Ss in 
the continuous R + P condition, though, would 
obviously receive reinforcement on every trial) 
in order to avoid having silence on the part of £ 
function as reward under the punishment con- 
dition or as punishment under the 
condition. 

Subjects.—There were 126 Ss. All were 
undergraduate students at the University of 
California. Fach S was assigned randomly to 
one of the seven groups by means of a table of 
random numbers with correction for equal N's 
at the end. They did not know the purpose of 
the experiment 


reward 


ReEsuLTs 


Half of the 108 experimental Ss 
were able to 


verbalize the correct 
principle sometime during the learning 
series; the other 54 were never able 
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to state the rule governing the correct 
response. The results for the acqui- 
sition period of the successful and the 
unsuccessful Ss will be presented 
separately first and then the results 
for the extinction period will be con- 
sidered. 

Verbalization of the principle.—The 
tank order of the conditions in terms 
of the number of Ss who achieved 
verbalization was R+P>R>P 
and continuous reinforcement had 
more verbalizers than partial rein- 
forcement. The differences among 
these conditions, however, are not 
statistically significant. 

The time at which the principle 
was successfully stated varied all the 
way from the first block of trials to 
Block 21. The mean number of 
blocks preceding correct statement of 
the principle were 7.4, 7.8, and 10.0 
for the R + P, R, and P conditions 
respectively. These means are sig- 
nificantly different beyond the .001 
level (F = 9.88, df = 2, 51). The 
differences remain significant when 
the analysis of covariance is used to 
correct for either number of reinforce- 
ments or number of correct responses. 
When the data are analyzed in terms 
of the schedule of reinforcement there 
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are 8.3 and 8.0 blocks to verbaliza- 
tion, respectively; the difference be- 
tween the two schedules is not 
statistically significant. It appears 
that verbalization is a much more 
sensitive measure than the number of 
successful Ss. These findings show 
that punishment is the type of 
reinforcement that is least favorable 
to verbalization. 

Preverbalization performance.—Fig- 
ure | presents the average number of 
correct responses as a function of 
distance from the block of trials pre- 
ceding correct verbalization. Since 
the principle was verbalized at dif- 
ferent points in the series, the results 
of different Ss were made comparable 
by the use of Vincent curves (4, p. 
282). Following a log transformation 
which reduced the heterogeneity of 
variance, the Vincent scores were 
analyzed by Grant’s extension of 
Alexander’s test for trend (2). The 
test indicates that there is a significant 
improvement in the performance of 
the successful Ss well before they are 
able to verbalize the correct principle. 

_Asummary of the complete analysis 
of acquisition for the experimental 
groups with and without the control 
group is presented in Table 1. The 
highly significant over-all trend indi- 
cates that the level of the average 
curve changes during the course of 
acquisition. Since only the linear 
component of the over-all trend is 
statistically significant, it may be 
concluded that the over-all trend is 
essentially linear. The differences in 
the linear components of the group 
trends are significant. In the analysis 
that excludes the control group, the 
differences in the linear components 
of the group trends do not reach 
significance while the over-all trend 
and its linear components attain 
higher levels of significance. It is 
therefore concluded that the rate of 
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Note.— Values for the higher-order components in the analysis of acquisition results have been omitted ; therefore, 


the dfs as listed do not add up to the total. 


improvement of the experimental 
groups is quite different from that of 
the control group and that a signifi- 
cant amount of LWA takes place 
prior to verbalization of the principle. 
The experimental groups do not, 
however, differ with respect to rate 
of LWA. Since the F between indi- 
vidual means is highly significant in 
both analyses, it is clear that reliable 
measures of performance have been 
used, 

Improvement after verbalization. 
For all groups there is a large and 
significant increase in the number of 
correct responses on the block at the 
end of which the principle is correctly 
stated. These findings parallel those 
obtained by Postman and Jarrett and 


more recently by Philbrick and Post- 
man. In the latter study all suc- 
cessful Ss eventually reached the 
criterion of perfect performance, i.e., 
two successive errorless blocks. In 
the present study that finding is 
reproduced only by the 100% R + P 
condition, the condition that uses the 
same type and schedule of reinforce- 
ment as the earlier study. 

Partially correct hypotheses.— Cac 
the performance called LWA be 
attributed to the operation of partially 
correct hypotheses relating the ‘ength 
of the stimulus word in some way to 
the magnitude of the response num- 
ber? Such hypotheses could account 
for LWA. Of the 54 successful Ss, 
32 reported such hypotheses and 22 
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didnot. Of the 54 unsuccessful Ss, 29 
reported such hypotheses and 25 did 
not. Furthermore, 9 of the 18 control 
Ss reported such hypotheses. Seven 
of these reached the four or more 
correct out of nine criterion and two 
did not. Therefore the action of 
partially correct hypotheses cannot 
be invoked to rule out the automatic 
action of reinforcement, since the use 
of partially correct hypotheses does 
not discriminate between either the 
successful and the unsuccessful Ss or 
the experimental and the control Ss. 
Performance of | nonverbalizers.— 
There was no significant trend in the 
performance of the unsuccessful Ss. 
In this respect the Philbrick-Postman 
findings have not been repeated. It 
should be noted, however, that the 
effect obtained in their study was 
small and that it was obtained with 
a larger number of cases than was used 
in the present group having the same 
type and schedule of reinforcement. 
Extinction following verbalization.— 
Figure 2 presents the results for the 
extinction period and Table 1 sum- 
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marizes the analysis of variance of 
these results. The highly significant 
over-all trend and quadratic com- 
ponent indicate that there is 
extinction and that the trend of the 
extinction is curvilinear. The sig- 
nificant differences between means 
indicate that the groups differ in the 
amount of extinction shown. The 
significant quadratic component for 
type indicates that the groups differ 
in the rate with which they ex- 
tinguish and that these differences in 
rate are a function of the type of 
reinforcement administered during 
acquisition. Thus it is clear that, 
although all successful Ss had ver- 
balized the principle, their persistence 
in applying it depends on the kind of 
training they receive. 

Figure 2 also reveals an unusual 
finding that merits further study: the 
continuous R +P group shows no 
extinction. Since all the other groups 
show some extinction, this appears 
to be an instance in which continuous 
reinforcement produces greater re- 
sistance to extinction than partial 


Partial P 


ze Partial R 
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reinforcement. This result is all the 
more interesting when it is noted that 
continuous R + P is the only group 
that cannot be operating under the 
partial reinforcement “‘set’’ the in- 
structions were intended to impart. 
Possibly awareness is a parameter 
modulating the effects of the schedule 
of reinforcement variable. 


Discussion 


Our results support Thorndike’s hy- 
pothesis. It has been shown that the 
action of an aftereffect on S-R connec- 
tions can be automatic and independent 
of S’s understanding. We obtain a 
significant preverbalization rise in the 
acquisition curve of the successful Ss, 
but we do not find any improvement in 
the performance of the unsuccessful Ss. 

The results also indicate that the 
influence of partially correct hypotheses 
cannot account for the LWA perform. 
ance. Partially correct hypotheses have 
been stated by both experimental and 
control Ss who showed no distinguishable 
improvement in performance, and im- 
provement in performance has _ been 
obtained from Ss who could state no such 
hypotheses. 

We have shown that verbalization is a 
function of the same conditions of rein- 
forcement as is the acquisition of specific 
verbal or motor responses. It has been 
found that reward is more favorable to 
verbalization than punishment. Thus it 
should now be possible to study the 
conditions of verbalization by many of 
the techniques that have been developed 
for the study of these other responses. 

The fact that there are differences in 
the performance of the verbalizers during 
extinction which are systematically re- 
lated to the conditions of acquisition is 
theoretically quite important. To the 
extent that the present results are 
general, certain interpretations of “‘in- 
sight” in the explanation of behavior are 
open to question. In the light of these 
findings it is not possible to go along with 
Kéhler in his acceptance of the con- 
viction of the layman who “‘believes that 
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he often feels directly why he wants to 
do certain things in one situation and 
certain other things in a second” (7, 
p. 320). For Kohler this means that 
“the forces which principally determine 
his mental trends and his actions are for 
the most part given in experience” (7, 
p. 320). The present results demon- 


strate quite clearly that, even though 
what is directly given in experience may 
be common to a group of Ss, the behavior 
observed remains a function of the con 
ditions of training. 


SUMMARY 


This experiment is concerned with the in 
fluence of type and schedule of reinforcement on 
LWA and subsequent verbalization. Speed of 
verbalization was found to vary as a function of 
the type of reinforcement administered. Pun- 
ishment was the type of reinforcement that was 
least favorable to verbalization. LWA as 
measured by a significant preverbalization trend 
was found under the experimental conditions 
though there were no significant differences in 
trend between the conditions 

The effect of partially correct hypotheses was 
examined and found insufficient to account for 
LWA performance 

The extinction results are directly at variance 
with the Gestalt hypothesis that the forces which 
principally determine behavior are those which 
are phenomenologically given. Performance 
during extinction of Ss who had reached the 
same criterion of verbalization was found to 
vary as a function of the reinforcement condi 
tions during training. 

The extinction results also furnish an instance 
in which reinforcement is more 
resistant to extinction than partial reinforce 
ment. 


continuous 
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GRADIENTS OF ERROR REINFORCEMENT IN NORMAL 
MULTIPLE-CHOICE LEARNING SITUATIONS ! 


MELVIN H. MARX 


University of Missouri 


Previous studies (1, 5, 6) have 
shown that guessing-sequence and 
other response-biasing factors can 
produce gradients similar to those 
reported by Thorndike (7, 8) around 
rewarded responses. Typically, no 
reward has been used in studies of 
guessing-sequence factors. 

The present study is concerned with 
whether such response biases can 
account wholly for the Thorndike 
effect. Results previously obtained 
under conditions of prearranged 
reward have indicated that, when 
these factors are controlled, there 
remains a significant tendency for 
errors immediately following repeated 
rewarded responses to be strengthened 
(3, 4; cf. also 2). The guessing- 
sequence type of artifactual explana- 
tion was eliminated in these studies 
by using repeated nonrewarded re- 
sponses as a base line for control 
purposes. 

The positive results obtained in 
these experiments led to the formu- 
lation of a modified Thorndikian 


1 am grateful to the University of Missouri 
Research Council for a grant in partial support 
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production, translation, publication, use, and 
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United States Government. A report of this 
study was made at the Chicago meetings of the 
Midwestern Psychological Association in May, 
1953, 
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spread-of-effect hypothesis (3). In 
brief, it holds that reward of a re- 
sponse in serial learning does 
strengthen, in a gradient manner, the 
nonrewarded responses that imme- 
diately follow. It differs from the 
original Thorndikian “spread” and 
“scatter” hypotheses (8) in these 
major respects: (a) It refers only to 
the after-gradient. (b) It assumes 
the dependence of the effect upon the 
repetition, or near repetition, of the 
rewarded response. Such repetition 
is seen as one of the most important 
components in the stimulus complex 
biasing toward repetition of the fol- 
lowing errors. (c) It is restricted to 
serial responses. 

The present study provides a test 
of whether the differential error 
strengthening predicted by this hy- 
pothesis occurs under natural rein- 
forcement conditions § in _ serial 
multiple-choice learning (that is, when 
reward is not prearranged but occurs 
normally, in accordance with the 
conditions of the experiment as de- 
scribed to S). The problem was to 
determine whether errors made im- 
mediately following repeated  re- 
warded responses are strengthened 
more than errors made immediately 
following repeated nonrewarded re- 
sponses. ‘This direct comparison was 
designed to permit evaluation of any 
residual error strength as a function 
of reward over and beyond the in- 
fluence of the guessing-sequence type 
of factor. 


Metuop 


Fifteen to twenty Ss were seated in chairs 
with small individual multiple-choice learning 
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boxes on the arms. Each box had 30 rows of 
10 holes each in the top. ‘The Ss were told that 
their task was to guess which of the 10 holes 
was correct for each of the 30 stimulus words 
called out by E. ‘They were to remember which 
hole was correct, and repeat the correct response 
on trials after the first, as well as try to discover 
new correct holes. ‘The S responded by inserting 
a stylus with a flashlight bulb in the top into the 
hole. The boxes were wired so that the bulb 
lighted when the correct hole was chosen. ‘The 
correct hole was determined randomly for each 
box. Each time a new stimulus list was begun, 
Ss were given a different box with different 
correct responses. 

‘Two experiments were conducted, using the 
same apparatus and general procedure. In 
Exp. I, Ss were 43 students in a beginning 
psychology laboratory course at the University 
of Missouri. They were given eight orthodox 
Thorndikian multiple-choice tests spaced over 
a period of several weeks. Each test consisted 
of six trials on a new set of verbal stimulus terms 
(common two-syllable nouns) and number re 
sponses (1 through 10). 

In Exp. Il, Ss were 99 Air Force basic trainees 
at Lackland Air Force Base, Texas. ‘They were 
given five trials on each of five lists within a single 
90-min. work period. 


RESULTS 


Experiment | 


The data presented were analyzed 
mainly as a function of the first repeated 
rewarded and first repeated nonrewarded 
responses in a trial. The first two re- 
sponses in the list were not used; the 
repeated response had to be followed by 
at least five nonrewarded responses. 
An additional restriction for selection of 
repeated responses on later trials was 
that responses repeated from preceding 
trials could not be used. 

The measure of response strength used 
was the difference between the number 
response made on the first trial and the 
number response made on the second 
trial. This measure has been found to 
be considerably more sensitive than the 
repetition-of-response measure (2). 
Lower difference scores indicate stronger 
responses—that is, less tendency toward 
variation of the first-trial response. 

The tests were spaced over a period of 
several weeks, and only 23 Ss furnished 
complete records for the eight test 
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periods. For these Ss difference scores 
for the first response following a repeated 
rewarded or repeated nonrewarded re- 
sponse were computed. In six of the 
eight cases, the error following the 
repeated rewarded response was stronger. 
The experimental mean was 2.49 com- 
pared to a control mean of 2.87. An 
analysis of variance of the data showed 
that the experimental-control difference 
(treatment difference) was significant 
beyond the .05 level (F = 4.72; 3.84 
required). 

Experimental and control after-gradi- 
ents were computed for all usable 
repeated responses over the first five 
trials for the first test (NV = 41). These 
differ from the previous analysis both 
in that all repeated responses were used 
and in that the three error positions 
following repetitions were used. The 
experimental gradient (2.63, 3.05, 3.05) 
was significant (F = 18.00, P < .001). 
The control gradient (2.88, 2.88, 3.00) 
was not significant (F = 2.23; 3.06 
required for .05 level). No direct sta- 
tistical comparison was made because 
of heterogeneity of variance. However, 
the two first after-error positions were 
again compared by the ¢ test, and the 
difference was significant (¢ = 2.98, 
P < 01). 

A second type of analysis was per- 
formed on the same data. The first- 
and second-trial first after-errors were 
correlated. This measure has proved 
advantageous because it is not inflated 
by possible differences in response distri- 
butions that may be associated with the 
effects of reward and that might produce 
artifactual gradients as measured by the 
usual difference-score method. All eight 
of the correlations were positive for the 
first after-error immediately following 
reward. The mean correlation was .23, 
(P = .01). However, six of the eight 
comparable control correlations—those 
involving the first after-error following 
a repeated nonrewarded response—also 
were positive. The mean control cor- 
relation was .13. A direct comparison 
of these values, based on a 2 trans- 
formation, gave a critical ratio of 1.16, 
which was not significant. 
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Experiment I] 


First after-error difference scores were 
computed for the 87 airmen who pro- 
vided complete records. The first after- 
error immediately following the first 
repeated rewarded response was stronger 
than that following the first comparable 
control (nonrewarded) repeated response 
(2.15 compared to 2.64), and the over-all 
treatment difference was highly signifi- 
cant (F = 10.02, P < .001). 

The analysis done in Exp. I, using all 
repeated responses on the five trials on 
the first test, was repeated. The experi- 
mental gradient (2.18, 2.86, 2.85) was 
significant (F = 9.24, P < .001). The 
control gradient (2.62, 2.74, 2.90) was 
not significant (F = 2.90; 3.03 required 
at the .05 level). The variances were 
again heterogeneous, so the first after- 
positions were again compared by the 
¢ test. The experimental after-error 
was significantly stronger (¢ = 4.51, 
P < .O01). 

Correlations were computed on the 
first test over the five trials. That is, 
the first after-errors for repeated re- 
sponses were correlated for Trials 1-2, 
2-3, 3-4, and 4-5, giving a total of four 
correlations based on the five trials on 
the first test. The four experimental- 
group correlations were .39, .22, .28, and 
.38, each significant beyond the .05 
level. The four control correlations were 
.23, .06, .17, and .19. Only one was 
significant. The mean experimental cor- 
relation of .31 was significant at the .01 
level, and the mean control correlation of 
.16 was significant at the .05 level. A 
z transformation and test showed that 
the experimental value was significantly 
greater (C. R. = 2.00, P < .05). 

Since the experimental-control differ- 
ence was, in this case, significant, partial 
correlations were also run in order to 
control for the possibility that some side- 
effect of reward was creating a closer 
relationship between the rewarded re- 
sponse and the first after-error on both 
the trials. If this were true, an artificial 
relationship might show up in_ the 
intertrial correlation. 

When the relationship between the 


227 


rewarded response and the first after- 
error on the two trials was partialled out, 
the resulting partial r’s in the experi- 
mental group were .41, .23, .28, and .38. 
All the control partial r’s were the same 
as before. Thus the partial r’s differed 
slightly more than the standard r’s, 
which indicates that the correlations 
were not artifactual. 


Discussion 


These results are quite consistent with 
previous experiments (3, 4) from this 
laboratory. The preponderance of the 
evidence supports the notion that 
guessing sequences can be responsible 
for the strengthening of responses, but 
that there are additional effects of reward 
upon the following responses. In the 
present two experiments, the mean 
control correlations for first after-errors 
following repeated responses were both 
positive. Yet, in both cases, the ex 
perimental correlation was greater. In 
Exp. II, the difference was significant. 
The same general kind of relationship 
holds for the gradients; the same general 
relationship has held for the same types 
of measures in the previous experiments, 
cited above. 

Again, the best evidence for a modified 
Thorndikian hypothesis comes from the 
first after-errors, where experimental 
control differences are generally signifi 
cant. It may be that, where the first 
after-errors are eliminated from con 
sideration, no significant gradient can 
be demonstrated. In each of the 
present experimental gradients the sig 
nificant slope was entirely due to the 
strength of the first after-error. Also, 
the significant effects that occur usually 
depend upon the repitition of the re 
warded response, as predicted by the 
modified Thorndikian hypothesis. 

This experiment, along with the previ 
ous ones, thus offers strong support for 
this hypothesis. Reward appears to be 
an essential factor in the empirical 
Thorndike effect. Direct comparison 
of the effects upon following errors of 
rewarded and 


nonrewarded response 
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repetitions have not been reported by 
proponents of the guessing-sequence or 
other response-bias explanations. These 
factors are given some support in the 
present study but only as partial con- 
tributors to the results. A serious ques- 
tion is thus raised as to the adequacy 
of any claim that they are sufficient 
factors to explain the typical spread-of- 
effect data. 


SUMMARY 


‘Two experiments used natural rewards in a 
multiple-choice guessing situation to evaluate a 
modified 'Thorndikian hypothesis. The hy- 
pothesis holds that there is strengthening of 
errors because of proximity to reward, that this 
strengthening depends on the repetition of the 
rewarded response, that it occurs only for re- 
sponses following reward, and that the reward 
plays a dual role in the strengthening. 

The predicted gradients and correlations 
between error responses were found. The 
results are interpreted as giving strong support 
to the hypothesis within a different experimental 
situation than the one in which it was 
formulated. 


MELVIN H. MARX 
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The fact that Ss demand more 
information in making decisions, the 
less reliable the information already 
obtained (3, 8) presumably reflects 
the value that they put upon being 
correct. On the other hand, the fact 
that they stop short of obtaining the 
maximum available information sug- 
gests that the time and effort of 
accumulating information operate as 
costs. Darwin is said to have col- 
lected data for twenty years before 
making public his views on natural 
selection (5, p. 465), which leads one 
to suppose that he was very anxious 
not to be mistaken. But no psy- 
chologist needs to be reminded that 
the number of Ss and trials he uses in 
testing a hypothesis is limited by the 
cost per datum in time, effort, and 
money. 

It has been argued (3, 4) that 
decisions in an “expanded judgment” 
situation are arrived at by processes 
analogous to those by which a scientist 
tests a hypothesis. If this is so, then 
Ss should require a relatively high 
level of reliability of information 
when the value of making a correct 
decision is large and the cost per unit 
of information is small, and vice 
versa. The present experiment was 
designed to determine whether, as 
predicted on these grounds, the num- 
ber of cards required for decisions in 
the expanded judgment situation 
would increase with the value of 
money prizes for correct decisions and 

1'This experiment was made possible by a 


grant from the Committee for the Advancement 
of Research of the University of Pennsylvania. 


University of Nevada 


would decrease with increased money 
cost per card. Further, since Ss’ 
ratings of their confidence in the 
correctness of their decisions vary 
with the reliability of the information 
upon which the decisions are based 
(4), it was desired to determine 


whether these ratings also would be 
affected by money prizes and costs. 


Mertuop 


Subjects.—The Ss were 5 summer school 
students at the University of Pennsylvania and 
15 undergraduates at the University of Nevada, 
including 8 men and 12 women. They were 
obtained as volunteers from psychology courses, 
and were told that they would be paid $1.00 for 
their services. The results of the men and 
women were not analyzed separately, since sex 
differences have not appeared in previous ex 
periments of the same general kind. 

Materials and procedure.—The materials and 
procedure were similar to those of Irwin and 
Smith (3) except as indicated below. The S 
was shown cards, each with a number stamped 
on it, one by one until he reported a decision 
that the mean of the whole pack was greater 
than, or less than, zero. No intermediate 
judgment was allowed. When he made his 
decision, he also rated his confidence in its 
correctness on a scale from — 100 to + 100 as 
used by Irwin, Smith, and Mayfield (4). 

The experiment was arranged ina 2 X 2 XK 2 
X 2 & 20 factorial design with each S under- 
going all 16 conditions. The values of the 
independent variables were: for prize, $.50, 
$1.00; for cost per card, 4 cent, 1 cent; for mean 
of pack, .5, 1.5; and for SD of pack, 2, 7.5. 
Since each combination of mean and SD of pack 
had to be used four times with each S, precau- 
tions were taken to prevent recognition of the 
packs. These involved using negative means in 
half of the packs and altering the numbers in 
the packs so that no two packs had identical 
sets of numbers. It was possible to make these 
alterations without disturbing the means, but 
the SD’s ranged from 2.0 to 2.1 and from 7.5 
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TABLE 1 


Mean Numoper or Carns Requimep ror Decisions anp Mean Conripence Ratincs 
vor Eacu Vatue or INDEPENDENT VARIABLES 











| 
| 
+0 








2.0 75 





17.7 
55.2 


Cards required 
Confidence ratings 


to 7.7. The order of the 16 conditions was 
varied from S to S in a counterbalanced way, so 
that each condition appeared in each position 
about equally often. The orders were irregular 
and were assigned to Ss at random. Results 
from packs with negative means were treated 
as if obtained from positive means, in view of 
the absence of differences associated with sign 
in the previous experiments. 

The S first judged a practice pack under 
instructions like those of Irwin and Smith (3) 
with additions having to do with confidence 
ratings. He was then instructed as follows: 

“All right, now we are going to start the main 
experiment. We will proceed in the same way, 
but we are going to give you a prize for each 
correct decision that you make, and we are going 
to charge you for each card that you look at. 
In the case of this first pack, you will receive 50 
cents (one dollar) if your decision is correct, but 
you will receive nothing if you are wrong. In 
either case, you will be charged one-half cent 
(one cent) for each card that you see. I will 
keep telling you how much you have spent on 
the cards. [After each card was shown, E 
informed S of the total spent so far on that 
pack.) Do you understand? 

“It is important to us that you try to win as 
much as you can, ‘The money is from a research 
grant, not from our pockets, so try hard. 
[Volunteer Ss are sometimes reluctant to take 
money from E£.] Your prizes will probably be 
more than you spend for cards, but of course 
you could lose money by making incorrect 
decisions about the averages of the packs and 
drawing too many cards. No matter what 
happens, we will not let you lose more than the 
dollar that we owe you for being a subject in 
the experiment. [This statement stemmed 
from a reluctance the converse of that just 
mentioned.] However, if you object to taking 
this chance, I will give you the dollar now and 
you may leave.” No S took advantage of this 
offer, which was intended to eliminate Ss with 
scruples against gambling. 





i 20.8 


16.3 
61.7 43.0 


15.9 


It was further made plain to S that a record 
of his wins and losses was being kept, but that 
he would not be informed of any outcomes until 
the end of the session. All Ss after the first were 
told that previous Ss had won as much as $10.00. 
The winnings averaged almost exactly this 
amount, and ranged from $6.87 to $11.59. 


Resutts anp Discussion 


The mean numbers of cards re- 
quired for decisions and the mean 
confidence ratings made at the time 
of decision are given in Table 1. 
Analyses of variance of the same 


TABLE 2 


ANALYSES oF VARIANCE OF NumBER or Carns 
Requirep For Decision AND OF 
Conripence Ratincs 


Confidence 
Ratings 
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Note.—-The first-order interactions (6, 7, 8, 9) were 
used to test the main effects (1, 2, 3,4). The remainder 
term (16) was used to test all others shown. 

< 5 


P< 001 
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variables appear in Table 2. The 
confidence ratings were treated alge- 
braically in these computations; i.e., 
ratings were given minus signs when 
the corresponding decisions were 
incorrect. 

Considering first the numbers of 
cards required for decisions, the 
predictions that more cards would be 
used for the larger prize and the 
smaller cost were confirmed. Both 
differences were significant, although 
the mean difference between prizes 
of $.50 and $1.00 was only about 2 
cards. In addition, greater numbers 
of cards were used for the smaller 
absolute value of the mean of the 
pack and the larger value of the SD 
of the pack; this confirmed the 
findings of Irwin and Smith (3), as 
did also the significant difference 
among Ss. _ A barely significant inter- 
action between SD and § also ap- 
peared. This was not found pre- 
viously and, as the only one of 10 
first-order interactions to achieve the 
.05 level of significance, may fairly be 
neglected. 

As to the confidence ratings, no 
significant difference resulted from 
variation of prize (F = 2.6, P > .10), 
but the ratings were significantly 
higher with cost of 4 cent than of 1 
cent per card. The latter fact is 
presumably to be interpreted in the 
light of the fact that a larger number 
of cards was drawn when the cost was 
small, since confidence ratings in- 
crease with number of cards seen (4). 
However, it remains to be determined 
which, if either, of these two facts is 
prior to the other. Mean confidence 
ratings were reliably greater for the 
larger absolute value of the mean of 
the pack and the smaller value of the 
SD, although numbers of cards drawn 
were related to mean and SD in the 
opposite manner; this means that the 
numbers of cards drawn cannot have 
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been the sole determiner of the con- 
fidence ratings. Significant  indi- 
vidual differences occurred in the 
confidence ratings. One of the 10 


first-order interactions, M X SD, was 
significant at better than the .05 level; 
it reflected a low value in the combi- 
nation of small mean and large SD. 


While it has been possible in this and 
the preceding experiments to make 
correct predictions of the effects of a 
number of variables upon the _ infor- 
mation required for a decision and the 
confidence ratings associated with de- 
cisions, using as a basis the heuristic 
analogy between the problems con 
fronted by Ss in these experiments and 
those of a scientist who is testing a 
hypothesis, it is of interest to consider 
other possibilities. First, there is the 
question whether the Ss have discovered 
and used some rational optimal strategy. 
This can apparently be answered in the 
negative, on the grounds that no such 
strategy is definable in the sense of theory 
of games and statistical decisions when 
S is not informed of the parameters of 
the distributions of numbers with which 
he is dealing or even of the form of these 
distributions. (It is therefore taking 
some liberty with the word to speak of 
S as gaining “information” from the 
cards.) We are acquainted with only 
one exception to this generalization,’ 
namely, that if S regards E as malevo- 
lent, his minimax strategy would consist 
of making a decision by means of a 
chance device with P = .5 and drawing 
no cards. This follows from the fact 
that E could arrange distributions of 
numbers that would be highly mis- 
leading, as, e.g., by introducing a single 
number into a pack at random that was 
opposite in sign to the mean of the rest 
of the pack and large enough to pull the 
mean to its own side of zero. At any 
rate, whatever their distrust of FE, no Ss 
adopted such a policy, the smallest 
number of cards drawn by any § under 
any condition being 4. 


2 Suggested by Professor David Blackwell of 
the University of California in conversation 
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A second conjecture supposes that S 
sets a certain absolute level of confidence 
in advance and draws cards until he 
attains this level on one side or the other, 
somewhat as a statistician might do in a 
sequential analysis. This can be re- 
jected on the grounds that the means of 
the absolute values of the confidence 
ratings varied significantly with the 
mean and SD of the cards, which would 
not have been the case if the final con- 
fidence ratings had been determined 
before the drawing. The mean absolute 
confidence ratings were 55 and 65 for 
packs with means of .5 and 1.5, respec- 
tively, the difference being significant 
beyond the .001 level when tested 
against the M X S§ interaction. Simi- 
larly, the mean absolute confidence 
ratings for packs with §D’s of 2 and 7.5 
were 66 and 57, respectively, the differ- 
ence again being reliable beyond the .001 
level as against the SD X S interaction. 
The absolute values of the confidence 
ratings did not vary significantly with 
prize or cost. 

A similar conjecture to the effect that 
S decides in advance the number of cards 
that he will draw is defeated by the fact 
that the number of cards drawn varied 
significantly with all of the experimental 
variables, as shown above. At least, if 
Ss made such decisions in advance, they 
did not adhere to them. 

It is an open question how far the 
original analogy to scientific decision- 
making can be pushed,’ successful as it 


* Tanner and Swets (7, p. 403) in a study of 
signal detection have regarded their Ss as faced 
with “the task of testing a statistical hypothe- 
sis.” They state, further, that they were able 
to manipulate the “false alarm rate” by varying 
the money values and costs of correct and 
incorrect decisions. Since their Ss were in- 
formed of the probability of the existence of a 
signal, an optimal false alarm rate was definable. 
The experiments of Cohen and Hansel (2) 
resemble the present experiment more closely, 
in that their Ss estimated the numbers of beads 
of different colors in an urn on the basis of 
samples drawn from the urn. In a psycho- 
physical study, Parducci (6) suggests that S 
deals with a shift in a series of stimuli according 
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has been up to now. With respect to 
matters not well codified in scientific 
method at its present stage of develop- 
ment, such as those having to do with 
values and costs,‘ the procedures of the 
scientist may be less illuminating for the 
behavior of the Ss in the present experi- 
ment than these Ss’ behavior is for the 
ways of the scientist. 


SUMMARY 


In an expanded judgment situation, money 
prizes were given for correct decisions and money 
costs were charged for information. The mean 
number of cards required for decisions was 
greater for the larger of two prizes and the 
smaller of two costs. It was also greater for 
smaller absolute value of the mean and larger 
value of the SD of the numbers seen, in accord- 
ance with previous findings. Confidence ratings 
varied significantly with cost, mean, and SD, 
but not with prize. 
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to whether the new series is seen as likely to have 
come from the same population as the prior 
series. 

*Churchman (1), to give one example, has 
discussed the role of value judgments in experi- 
mental science. 
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